UNCLASSIFIED 


REPORT  DOCUMENTATION  PAGE  omb  NoTwtaa 

Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing 
data  sources,  gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate 
or  any  other  aspect  of  this  collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services.  Directorate  for  Information 
Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction 

Proiect  (0704-0188),  Washington,  DC  20503. 

3.  REpoRT  TyPE  and  dates  COVERED 

final  1  Jul  87  -  30  Apr  89 

4.  TltLt  AND  SUBTITLE 

Detection  and  Time  Delay  Estimation  of  Non-Gaussian  Signals  in  Noise 

5.  FUNDING  NUMBERS 

N0001 4-87-K-0785 

6—S0TR5E(3) 

Wilson,  Gary  R. 

7.  PERFORMING  ORGANIZATION  NAMES<S)  AND  Ab&ftE§S<ES) 

Applied  Research  Laboratories 

The  University  of  Texas  at  Austin 

P.O.  Box  8029 

Austin,  Texas  78713-8029 

8.  INFORMING  ORGANIZATION 

REPORT  NUMBER 

ARL-TR-90-25 

8~§PbN£6RihlG/MONlTORlNG  AOtfJcV  TOHE(3)  aNC  ADBtiE§3<E5)  -  -- 

Office  of  the  Chief  of  Naval  Research 

Department  of  the  Navy 

Arlington,  Virginia  22217-5000 

10.  SPONSORING/MONITORING 

AGENCY  REPORT  NUMBER 

11.  SUPPLEMENTARY  NOTES 

12a.  DISTRIBUTIoN/AVAILAbIUTy  STATEMENT 

Approved  for  public  release;  distribution  is  unlimited. 

12b.  DISTRIBUTION  CODE 

13.  ABSTRACT  (Maximum  200  words) 

Gaussian  time  series  have  the  property  that  their  higher  order  spectra  are  identically  zero.  In  many  signal  processing 
applications,  the  noise  fields  are  predominantly  Gaussian.  Thus  if  the  signal  has  a  nonzero  higher  order  spectra  (i.e.,  the 
signal  is  non-Gaussian),  then  higher  order  spectral  processing  of  the  combined  signal  plus  noise  field  may  provide 
processing  gains  over  more  traditional  processing  methods  that  rely  on  tower  order  properties  of  the  signal. 

(cont'd  on  reverse  side) 

14.  SUBJECT  TERMS 

bispectrum  cross- bi spectrum  higher  order  spectra 

polyspectra  time  delay  estimation  non-Gaussian 

17.  SECURITY  CLASSIFICATION 

OF  REPORT 

UNCLASSIFIED 

18.  SECURITY  CLASSIFICATION 

OF  THIS  PAGE 

UNCLASSIFIED 

19  SECURITY  CLASSIFICATION 

OF  ABSTRACT 

UNCLASSIFIED 

20.  LIMITATION  OF 

ABSTRACT 

SAR 

NSN  7540  01  280-5500 


UNCLASSIFIED 


Standard  Form  298  ( Rt-v  7  1)9) 
Prescribed  by  ANSI  Std  739  18 
298  102 


UNCLASSIFIED 


13.  (cont'd) 

In  this  report,  a  detection  statistic  is  defined  based  on  a  sample  estimate  of  the  bispectrum.  It  is 
shown  that  under  the  null  hypothesis  of  noise  only,  the  detection  statistic  is  approximately  central 
chi-square  distributed  and  under  the  alternative  hypothesis  of  signal-plus-noise  it  is  noncentral 
chi-square  distributed.  The  noncentrality  parameter  of  the  noncentral  chi-square  distribution  is 
derived  in  terms  of  the  processing  parameters  of  the  bispectrum  estimate,  the  degree  of  non- 
Gaussianity  of  the  signal  (the  skewness  function)  and  the  signal-to-noise  ratio.  Since  the  mean 
value  of  the  detection  statistic  is  equal  to  the  noncentrality  parameter,  the  performance  of  the 
detection  statistic  as  a  function  of  all  relevant  signal,  noise,  and  processing  parameters  is  clearly 
demonstrated.  A  new  bispectrum  waterfall  function  is  defined  for  use  in  displaying  the  detection 
results  in  a  familiar  lofargram  type  of  display. 

The  cross-bispectrum  is  also  exploited  to  define  time  delay  estimators.  An  estimator  based  on  the 
phase  of  the  cross-bispectrum  is  defined  and  its  statistical  properties  are  derived.  The 
dependence  of  the  resolution  of  the  time  delay  estimator  on  processing  parameters,  the  signal 
skewness,  the  signal-to-noise  ratio,  and  the  coherency  of  the  noise  is  explicitly  derived.  A  new 
time  delay  estimator  is  also  defined  which  casts  the  two-dimensional  bispectrum  into  a  one¬ 
dimensional  framework  much  like  the  bispectrum  waterfall  used  in  detection,  and  allows  it  to  have 
many  of  the  properties  of  a  crosscorrelation  function. 


UNCLASSIFIED 


TABLE  OF  CONTENTS 


Page 


LIST  OF  FIGURES .  v 

1 .  INTRODUCTION .  1 

2.  BISPECTRUM  ANALYSIS .  3 

2. 1  THEORY  OF  BISPECTRUM  PROCESSING .  3 

2.1.1  Comparison  to  the  Power  Spectrum .  3 

2.1.2  Formal  Definition  and  Properties .  5 

2.1.3  Principal  Domain  of  the  Discrete  Time  Bispectrum .  8 

2.2  ESTIMATION  OF  THE  BISPECTRUM .  13 

2.2.1  The  Unnormalized  Bispectrum  Estimate:  Averaging  in  Frequency 

and  in  Time .  13 

2.2.2  The  Normalized  Bispectrum:  Statistical  Properties .  16 

2.3  THE  BISPECTRUM  WATERFALL .  18 

3 .  DETECTION  OF  NON-GAUSSIAN  SIGNALS  IN  NON-GAUSSIAN 

NOISE .  25 

4.  TIME  DELAY  ESTIMATION  USING  THE  CROSS-BISPECTRUM .  27 

4. 1  TIME  DELAY  ESTIMATION  BASED  ON  THE  PHASE  OF  THE 

CROSS-BISPECTRUM .  27 

4.2  AN  ALTERNATIVE  TIME  DELAY  ESTIMATOR .  29 

APPENDIX  A  DETECTION  OF  NON-GAUSSIAN  SIGNALS  IN 

NON-GAUSSIAN  NOISE  USING  THE  BISPECTRA .  33 

APPENDIX  B  TIME  DELAY  ESTIMATION  USING  THE  CROSS¬ 
BISPECTRUM  .  59 

REFERENCES  .  87 


Accession  For 

NTIS  GRA&I 
DTIC  TAB  □ 

Unannounced  Q 

Justification _ _ 


By - - 

Dlstrlbut Ion/ 


Availability  Codes 


Diet 

Avail  and/or 
Speolal 

r1 

in 


to  frq 


LIST  OF  FIGURES 


.  1  Symmetries  of  the  Bispectrum .  10 

2.2  Symmetries  of  the  Bispectrum .  11 

2 . 3  Principal  Domain  of  the  Bispectrum .  12 

2.4  False  Alarm  Probability  as  a  Function  of  Normalized  Bispectrum 

Threshold .  19 

2.5  Bispectrum  Waterfall .  21 

2.6  Example  of  Bispectrum  Waterfall .  23 

4.1  Principal  Domain  and  Support  Set  for  the  Cross-Bispectrum .  28 


v 


1.  INTRODUCTION 


Gaussian  time  series  have  the  property  that  their  higher  order  spectra  are  identically 
zero.  In  many  signal  processing  applications,  the  noise  fields  are  predominantly  Gaussian. 
Thus  if  the  signal  has  a  nonzero  higher  order  spectra  (i.e.,  the  signal  is  non-Gaussian), 
then  higher  order  spectral  processing  of  the  combined  signal  plus  noise  field  may  provide 
processing  gains  over  more  traditional  processing  methods  that  rely  on  lower  order 
properties  of  the  signal. 

However,  any  practical  signal  processing  applications  of  higher  order  spectra  rely 
on  estimates  of  the  higher  order  spectra  made  from  a  finite  number  of  data  samples,  and 
thus  the  statistical  properties  of  various  estimators  of  the  higher  order  spectra  determine  the 
processing  gains  that  may  be  achieved  in  practice  for  specific  signals  and  noise.  Therefore, 
to  determine  the  utility  of  higher  order  spectral  processing  it  is  necessary  to  determine 
estimators  that  are  appropriate  for  specific  signal  processing  applications  and  determine  the 
performance  of  those  estimators  for  the  signal  and  noise  characteristics  of  the  specific 
application  based  on  the  derived  statistical  properties  of  the  estimators.  This  report 
summarizes  the  work  performed  under  Contract  N00014-87-K-0785  in  which  statistical 
estimators  for  the  detection  and  time  delay  estimation  of  non-Gaussian  signals  were  defined 
and  their  statistical  properties  in  the  presence  of  noise  were  derived.  —  / 

This  research  is  being  conducted  in  coordination  with  other  related  6.2  and  6.3 
research  projects,  which  serve  to  identify  specific  applications  of  interest  to  the  Navy  and 
serve  as  a  means  of  transitioning  this  work  to  Navy  systems.  Results  from  this  work  are 
being  applied  to  classified  data  analysis  under  these  other  programs.  The  results  of  the 
work  conducted  under  this  project  are  published  in  Hinich  and  Wilson  (1989,  1990). 

In  this  report,  a  detection  statistic  is  defined  based  on  a  sample  estimate  of  the 
bispectrum.  It  is  shown  that  under  the  null  hypothesis  of  noise  only,  the  detection  statistic 
is  approximately  central  chi-square  distributed  and  under  the  alternative  hypothesis  of 
signal-plus-noise  it  is  noncentral  chi-square  distributed.  The  noncentrality  parameter  of  the 
noncentral  chi-square  distribution  is  derived  in  terms  of  the  processing  parameters  of  the 
bispectrum  estimate,  the  degree  of  non-Gaussianity  of  the  signal  (the  skewness  function) 
and  the  signal-to-noise  ratio.  Since  the  mean  value  of  the  detection  statistic  is  equal  to  the 
noncentrality  parameter,  the  performance  of  the  detection  statistic  as  a  function  of  all 
relevant  signal,  noise,  and  processing  parameters  is  clearly  demonstrated.  A  new 
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bispectrum  waterfall  function  is  defined  for  use  in  displaying  the  detection  results  in  a 
familiar  lofargram  type  of  display. 

The  cross-bispectrum  is  also  exploited  to  define  time  delay  estimators.  An 
estimator  based  on  the  phase  of  the  cross-bispectrum  is  defined  and  its  statistical  properties 
are  derived.  The  dependence  of  the  resolution  of  the  time  delay  estimator  on  processing 
parameters,  the  signal  skewness,  the  signal-to-noise  ratio  (S/N),  and  the  coherency  of  the 
noise  is  explicitly  derived.  A  new  time  delay  estimator  is  also  defined  which  casts  the  two- 
dimensional  bispectrum  into  a  one-dimensional  framework  much  like  the  bispectrum 
waterfall  used  in  detection,  and  allows  it  to  have  many  of  the  properties  of  a  cross¬ 
correlation  function. 

The  results  of  this  study  can  be  used  by  a  system  designer  to  determine  the 
feasibility  of  higher  order  spectral  processing  for  specific  signal  processing  applications. 
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2.  BISPECTRUM  ANALYSIS 


2.1  THEORY  OF  BISPECTRUM  PROCESSING 

2.1.1  Comparison  to  the  Power  Spectrum 

While  power  spectral  analysis  of  a  time  series  is  a  concept  that  is  familiar  to  many, 
bispectral  analysis  may  not  be  as  familiar.  A  family  of  spectra,  called  polyspectra,  has  been 
defined  for  a  stationary  time  series,  of  which  the  power  spectrum  is  only  the  lowest  order 
member  of  the  family  (Brillinger,  1965;  Rosenblatt,  1966;  Brillinger  and  Rosenblatt, 
1967a, b).  The  next  higher  order  member  of  this  family  is  the  bispectrum.  In  the  following 
paragraphs,  the  power  spectrum  and  bispectrum  will  be  compared  and  contrasted  by 
explaining  their  relationships  to  the  Fourier  transform  of  the  time  series  and  the  correlation 
(or  cumulant)  functions  of  the  time  series. 

For  a  stationary  discrete  time  series,  its  power  spectrum  is  given  in  terms  of  the 
Fourier  transform  of  the  time  series  by 

P(co)  =  (X(cd)X'(cd))  =  ( |X(co)  |2)  ,  (2.1) 

where  X(co)  is  the  Fourier  transform  of  the  time  series  X(t), 

oo 

X(Oi)  =  -L  Y  x(t,)  e-'w,i  ,  -  k  <  to,  <  k  .  (2.2) 

2n  ~ 

Ijs-oo 

<•>  denotes  the  expected  value,  and  *  denotes  complex  conjugate.  As  can  be  seen  from 
Eq.  (2.1),  the  power  spectrum  contains  information  only  about  the  magnitude  of  the 
Fourier  transform  of  a  time  series. 

However,  the  bispectrum  and  higher  order  polyspectra  contain  information  about 
both  the  magnitude  and  specific  phase  relationships  (or  coherences)  between  multiple 
frequency  components  of  the  Fourier  transform  of  a  time  series.  Different  polyspectra 
describe  different  phase  relationships.  For  this  reason,  the  bispectrum  (and  higher  order 
polyspectra)  contain  additional  information  about  a  time  series  that  cannot  be  obtained  from 


the  power  spectrum.  For  a  stationary  time  series,  the  bispectrum  is  given  in  terms  of  its 
Fourier  transform  by 

B(co1,0)2)  =  (X(co1)X(co2)X‘(coi+co2))  .  (2.3) 

The  bispectrum  is  a  two-dimensional  function  of  frequency.  The  bispectrum  is  in  general  a 
complex  function  rather  than  a  real  function  like  the  power  spectrum.  The  two-dimensional 
complex  bispectrum  provides  information  about  phase  relationships  between  the  signal 
components  at  frequencies  co-i.cog,  and  (1)1+002. 

The  previous  paragraphs  have  explained  mathematically  what  the  bispectrum  is  but 
have  shed  little  light  on  what  the  bispectrum  means  and  how  it  can  be  interpreted. 
Equation  (2.3)  represents  the  bispectrum  as  the  expected  value  of  the  product  of  Fourier 
transform  values  at  the  three  frequencies  (Oi,  0)2,  and  CO1+CO2.  Thus  a  nonzero  bispectrum 
occurs  only  when  these  three  frequency  components  are  statistically  dependent.  For  any 
Gaussian  time  series,  all  frequency  components  are  independent,  so  a  Gaussian  time  series 
will  produce  a  zero  bispectrum.  However,  if  the  time  series  contains  a  quadratic 
nonlinearity  such  that  two  frequency  components  interact  to  produce  a  sum  or  difference 
frequency,  then  the  three  frequencies  will  be  statistically  dependent.  In  this  case  a  nonzero 
value  will  occur  in  the  bispectrum  at  the  pair  of  frequencies  that  define  the  nonlinear 
interaction.  Thus  the  bispectrum  will  indicate  the  presence  of  (quadratic)  nonlinearities  in 
the  time  series  and  identify  the  frequencies  contributing  to  the  nonlinear  behavior.  This  is 
the  basis  for  the  statistical  tests  for  Gaussianity  and  linearity  using  the  bispectrum  described 
by  Hinich  (1982)  and  others.  The  three  frequency  components  may  also  be  statistically 
dependent,  not  from  a  nonlinear  interaction  but  from  linear  mechanisms  which  result  in 
harmonics  of  some  fundamental  frequency.  In  this  case  the  time  series  can  be  thought  of  as 
a  periodically  time  varying  (i.e.,  nonstationary)  process.  The  bispectrum  can  indicate  the 
presence  of  these  nonstationarities  due  to  their  statistical  dependence  of  distinct  frequency 
components.  Thus  harmonically  related  signals  can  be  identified  in  the  bispectrum.  Also, 
if  a  time  series  is  linearly  time  varying  (e.g.,  transient),  it  will  have  frequency  components 
that  are  statistically  dependent.  Therefore,  transient  signals  will  produce  a  nonzero 
bispectrum.  In  this  brief  discussion  we  have  identified  that  quadratic  nonlinear  signals, 
harmonically  related  signals,  and  transients  can  result  in  nonzero  bispectral  signatures. 
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2.1.2  Formal  Definition  and  Properties 


Higher  order  spectra  of  a  random  process  are  often  referred  to  as  cumulant  spectra. 
The  nth  order  cumulant  function  of  a  random  process  X(t)  is  defined  in  terms  of  the 
characteristic  function  of  the  random  process  by 

[X(t),  X(t+t.) . Xfrv,)Wir  »'"»<*••  •••*">  la,-  -a-0  <2-4> 

5a,...8a„  'a, - an-u 


where  the  characteristic  function  is  defined  as 

<X>(a1t . . . ,  an)  =  (exp{i(a1X(t)  +  a2X(t+ti)  +  . . .  +  anX(t+tn-i))})  •  (2.5) 

The  square  brackets  []  denote  the  cumulant  function.  One  property  of  the  characteristic 
function  is  that  if  any  subset  of  the  X(tj)  are  independent  of  the  rest  of  the  X(tj),  then  the 
characteristic  function  factors.  For  example,  if  X(t), .  .  .  ,  X(t+xr_i)  are  independent  of 
X(t+tr), .  .  .  ,  X(t+Tn_i),  then  the  characteristic  function  is 

0(ai .....  an)  =  <D(ai . ar)<i>(ar+1 . an)  .  (2.6) 

If  the  characteristic  function  factors,  then  the  nth  partial  derivative  of  the  characteristic 
function  given  in  Eq.  (2.4)  is  zero  and  the  cumulant  function  is  thus  zero.  This 
demonstrates  an  important  property  of  the  cumulant  function:  the  nth  order  cumulant 
function  is  nonzero  only  when  there  is  statistical  dependence  between  the  n  elements  of  the 
cumulant  function. 

If  the  random  process  X(t)  is  stationary,  then  there  exists  a  process  Z(co)  with 
orthogonal  increments  such  that 


X(t) 


•I 


e,UodZ(co) 


(2.7) 


The  nth  order  cumulant  of  X(t)  can  then  be  written  as 


The  nth  order  cumulant  of  X(t)  is  nonzero  only  when  the  nth  order  cumulant  of  the 
orthogonal  increments  dZ(co)  is  nonzero.  However,  because  the  increments  are 
orthogonal,  their  cumulant  is  nonzero  only  if  all  of  the  n  frequencies  are  not  distinct. 
Furthermore,  because  X(t)  is  stationary,  its  nth  order  cumulant  is  independent  of  t,  which  is 
true  in  general  only  for  the  case  0)]  +  .  .  .  +  con  =  0  due  to  the  presence  of  the  exponential 

01(0,*-.  +ovjt  in  ^e  integral.  This  case  satisfies  the  requirement  that  the  n  frequencies  not  be 

all  distinct.  Thus  the  nth  order  cumulant  of  X(t)  is 

(X(t),  X(t+x1),...,X(t+tn.1)]n  = 

oo  oo 

I*  r  i(co,t,f  . . .  +  to„  ,x  ) 

J  -  J  e  11  -1  n1  [dZfo), ),..., dZ(con)]n  (2.9) 

-OO  -oo 

for  oj-t  +  .  .  .  +  con  =  0,  and  zero  otherwise. 

If  we  now  define  a  function  Pn(C0t, .  .  .  ,  G)n.j)  such  that 

dPn(co1, . . . ,  wn  .,)  =  [dZ(o>1 ) - dZ(con  1),  dZ(-co1  .  . .  -ton1)]n  ,  (2.10) 

then  if  Pn  is  continuous  it  can  be  written  as 

dPpfco, - -  con1)  =  pn(0)1 - -  wn  1)dco1  .  .  .  dcon1  ,  (2.11) 


Thus  the  nth  order  cumulant  of  X(t)  is  the  inverse  Fourier  transform  of  the  function  pn. 
This  function  is  referred  to  as  the  nth  order  (cumulant)  spectral  density  function  of  X(t).  If 
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the  cumulant  function  is  integrable,  then  the  nth  order  spectral  density  function  can  be 
written  as  the  Fourier  transform  of  the  cumulant  function: 


Pn(“i . wn-1  >  =  3n-1  <  • XW . X(t+tn-1  )]n  1  •  (2.13) 


Brillinger  ( 1975)  has  shown  that  if  X(co)  is  the  finite  Fourier  transform  of  X(t), 


N-i 

X(co)  =  ^  X(t)e  lax 

t=o 


(2.14) 


then  the  cumulant  of  the  Fourier  transforms  is  equal  to  the  nth  order  spectral  density 
function  plus  lower  order  terms: 

l  XfO), ),  X(C02) - ,  X(-0)r0)2  .  .  .  -G)n  _,))„ 

=  (2?t)n  ‘  Npn  (0)r  . . .  ,  con1)  +  0(1)  (2.15) 

Equation  (2.13)  is  the  formal  definition  of  the  nth  order  spectral  density  function 
and  assumes,  among  other  things,  that  X(t)  is  stationary  and  has  an  nth  order  cumulant  that 
is  integrable.  If  in  addition  X(t)  has  a  finite  Fourier  transform,  then  the  nth  order  spectral 
density  function  can  be  expressed  as  Eq.  (2.15).  As  a  result  of  the  independence  propert) 
of  cumulants  discussed  above,  Eq.  (2.15)  shows  that  the  nth  order  spectral  density 
function  is  zero  unless  the  Fourier  transforms  are  statistically  dependent. 

The  lowest  order  spectral  density  function  (n=2)  is  just  the  power  spectrum.  For  a 
zero  mean  process  X(t)  the  second  order  cumulant  is  equal  to  the  second  order  expected 
value: 


(X(t),  X(t+x)|2  =  (X(t)  X(t+x))  =  R(t)  .  (2.lo) 


Equation  (2.13)  then  yields  the  familiar  definition  of  the  power  spectrum  as  the  Fourier 
transform  of  the  autocorrelation  function  R(t).  Equation  (2.15)  yields  the  expression  for 
the  power  spectrum  as  the  expected  value  of  the  magnitude  squared  of  the  Fourier 
transform  of  the  time  seties,  as  given  in  Eq.  (2.1 ).  The  next  higher  order  spectral  density 
function  (n=3)  is  the  bispectrum.  For  a  zero  mean  process  the  third  order  cumulant  is 


equal  to  the  third  order  expected  value,  so  Eq.  (2.15)  simplifies  to  Eq.  (2.3)  for  the  case  of 
the  bispectrum. 

2.1.3  Principal  Domain  of  the  Discrete  Time  Bispectrum 

for  a  real,  discrete  time  series  X(j),  the  bispectrum  is  defined  over  the  (o-j  ,0^2 ") 
plane  given  by  (  -n  <  coi  <  n,  -k  <  Cl>2  <  tt}.  However,  because  of  various  symmetries  in 
the  bispectrum  many  regions  of  the  bispectrum  are  equivalent.  In  this  section  these 
symmetries  will  be  described  and  the  principal  domain  (the  region  that  contains  within  it  no 
equivalent  values)  will  be  defined. 

If  X < co )  is  the  Fourier  transform  of  the  discrete  time  series  X(j).  then  following 
Eq.  (2.15),  the  bispectrum  is 

B(co1,co2)  =(X(coi)X(g>2)X(g>3)),  (2.17) 

v.  here 

(1)3  =  27tn  -  (coi  +  (o2) ,  n  =  0,  1.  (2.  IK) 

The  value  of  n  is  restricted  to  0  and  1  because  for  any  other  values  there  are  no  values  of 
co-j.  0)2,  and  0)3  in  the  domain  from  -Jt  to  7t  that  satisfy  Eq.  (2. IK).  It  can  be  seen  from 
Eq.  (2.17)  that  the  value  of  the  bisnectrum  is  not  changed  if  any  pair  of  frequencies  is 
interchanged.  This  observation  leads  to  the  following  symmetry  relationships: 


B(0)i,0)2)  =  B(o)2,(Oi)  , 

(2.19a) 

B(C0i.C02)  =  B((03,0)i)  , 

(2.19b) 

B((i)i,C02)  =  B((03,0)2)  • 

(2.19c) 

for  n  =  0,  Eqs.  (2.19b)  and  (2.19c)  can  be  written  as 

B(o)i,0)2>  =  B(-o)i-0)2.o>i)  ,  (2.19d) 


B(d)i  ,0)2 )  --  B (  (1)1  -0)2, 0)p)  . 


For  n  =  1 ,  these  symmetry  relationships  are  given  by 


B(w1,co2)  =  B (271-coi -0)2, o)i )  >  (2. 190 

B(o)i,o)2)  =  B(2t:-o)i-o)2,o)2)  .  (2.19g) 

One  additional  symmetry  relationship  results  from  the  assumption  of  a  real  time  series: 

B(o)i,0)2)  =  B  (-o)i,-co2)  .  (2. 1 9h) 

Using  these  symmetries,  a  principal  domain  of  the  bispectrum  can  be  defined. 
From  Eq.  ( 2 . 1 9 h ) ,  it  follows  that  the  regions  of  the  (0)1t  0)2)  plane  shown  in  Fig.  2.1(a) 
are  equivalent.  Thus  it  is  only  necessary  to  consider  the  upper  half  of  the  plane  (positive 
co2).  From  Eq.  (2.19e),  the  two  triangular  regions  shown  in  Fig.  2.1(b)  are  equivalent. 
Combining  Eq.  (2.19e)  with  Eqs.  (2.1 9h)  and  (2.19a),  it  can  be  seen  that  the  two 
triangular  regions  in  Fig.  2.1(c)  are  also  equivalent.  Thus  it  is  necessary  to  consider  only 
positive  values  of  both  0)-j  and  co2. 

Applying  Eq.  (2.19a)  to  the  positive  (to^,  0)2)  quadrant  results  in  two  halves  of  the 
quadrant  being  equivalent,  as  shown  in  Fig.  2.2(a).  Equation  (2.190  implies  that  the  two 
triangles  in  Fig.  2.2(b)  are  equivalent,  while  Eq.  (2.19g)  implies  that  the  two  triangles  in 
Fig.  2.2(c)  are  equivalent.  No  other  symmetry  relationships  can  be  applied  to  further 
eliminate  equivalent  regions  in  the  bispectrum.  Thus  the  principal  domain  is  the  triangular 
region  shown  in  Fig.  2.3(a). 

Although  this  triangular  region  is  the  principal  domain,  if  the  time  series  is  a 
stationary,  continuous  time  function  that  has  been  low  pass  filtered  prior  to  digitizing  such 
that  there  is  no  energy  above  the  Nyquist  frequency  (no  aliasing),  then  the  bispectrum 
should  only  be  nonzero  in  the  triangular  region  shown  in  Fig.  2.3(b)  (Hinich  and 
Wolinsky,  1 988).  This  result  is  due  to  the  fact  that  it  is  only  within  this  triangular  region 
that  the  sum  cu1  +  0)2  is  less  than  the  Nyquist  frequency.  Thus  with  proper  filtering  it  is 
only  necessary  to  compute  the  bispectrum  over  the  isosceles  triangle  shown  in  Fig.  2.3(b). 
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FIGURE  2.3 

PRINCIPAL  DOMAIN  OF  THE  BISPECTRUM 
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2.2  ESTIMATION  OF  THE  BISPECTRUM 


2.2.1  The  Unnormalized  Bispectrum  Estimate:  Averaging  in  Frequency 
and  in  Time 

The  estimator  used  to  estimate  the  hispectrum  for  the  analysis  presented  in  this 
report  was  developed  by  Hinich  (1982)  and  is  based  on  a  discrete  Fourier  transform  of  the 
data.  For  the  discrete  time  series  x(t),  t=0,  1 ,  N-1 ,  the  discrete  Fourier  transform  is 
given  by 


N-1 


x®=FfX 


-i2njt 

x(t)  e^r. 


j  =  0,  1 , 


N  -  1 


(2.20) 


t=o 


The  frequency  associated  with  the  jth  component  is 


'riU'i*0'1 . 1 

-(N-j) 


N 


f..j- J+1 . N-1 


(2.21) 


where  fs  is  the  rate  at  which  the  time  series  was  sampled.  The  discrete  Fourier  transform  is 
computed  using  an  FFT  algorithm. 

A  consistent  estimate  of  the  bispectrum  (one  whose  expected  value  approaches  the 
true  value  and  whose  variance  goes  to  zero  as  N  approaches  infinity)  can  be  constructed 
using  an  FFT  of  the  time  series.  The  expected  value  of  the  complex  function 

F(j,k)  =  N2  X(j) X(k)  X’fj+k)  (2.22) 

is  equal  to  the  bispectrum  B(fj,f|<)  plus  terms  on  the  order  of  N'1.  Thus  F(j,k)  is  an 
unbiased  estimate  of  the  bispectrum.  However,  it  is  not  a  consistent  estimate  of  the 
bispectrum  because  its  variance  increases  with  N.  To  obtain  a  consistent  estimate,  either 
the  function  F(j,k)  can  be  averaged  in  frequency,  or  multiple  realizations  of  F(j,k)  can  be 
averaged  in  time,  or  both. 

To  average  in  frequency,  the  function  F(j.k )  is  averaged  over  adjacent  values  in  a 
square  of  M2  points  centered  at  the  points 


(g.n,gn) 


_  ( (2m-1)M  -1  (2n-1)M  -  1 


m  =  1,  ■  ■  +  .5 


n  =  1,--,m  n  < -  3m  +  3- + -3-  , 
LM  2  2MJ 


where  [•]  denotes  the  "greatest  integer  not  exceeding"  function.  Thus  the  frequency 
averaged  bispectrum  estimator  is  given  by 

mM-1  nM-1 


&(m.n)  =  M'2  £  F(j,k) 

j=(m-1)M  k=(n-1)M 


j,k:  0<j<y,0<k<j,2j  +  k<N 


(2.23) 


If  the  bispectrum  is  slowly  varying  over  the  square  and  if  the  power  spectrum  is  slowly 
varying  over  the  band  of  width  M  centered  at  the  appropriate  frequencies,  then  the  variance 
of  this  estimator  is  given  by 

VARB(m,n)=NM‘lQ(m,n|P(lgJP(fg-)P(fB^gn)  +  aM/N)  ,  (2.24) 

where  Q(m,n)=M2  if  the  square  is  entirely  within  the  principal  domain;  otherwise  it  is 
equal  to  the  number  of  (j,k)  within  the  square  but  not  on  the  boundaries  j=k  or  2j+k=N 
plus  twice  the  number  of  (j,k)  on  the  boundaries.  P(-)  is  the  power  spectrum  of  the  time 
series.  From  Eq.  (2.24)  it  can  be  seen  that  the  estimator  given  by  Eq.  (2.23)  is  a  consistent 
estimator  for  values  of  M  given  by 


fN  <M<  N  . 


The  bias  increases  and  the  variance  decreases  as  M  increases. 


(2.23) 


To  average  the  bispectrum  in  time  also,  the  time  series  can  be  divided  into 
L  segments,  each  of  length  N,  and  the  bispectrum  is  estimated  (and  possibly  averaged  in 
frequency)  for  each  segment  and  all  L  estimates  are  averaged  together.  If  each  of  the 
L  bispectrum  estimates  is  uncorrelated,  the  variance  of  this  coherent  average  is  just 
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(2.26) 


VAR  BL(m,n)  = 


VAR  B(m,n) 


In  this  case,  consistency  is  obtained  for  values  of  L  and  M  given  by 


<  M  <  N 


(2.27) 


Thus  consistency  can  be  obtained  by  averaging  in  time  as  well  as  frequency. 


The  tradeoffs  between  frequency  and  time  averaging  can  best  be  understood  by 
examining  the  variance  of  the  bispectrum  estimate.  From  Eqs.  (2.24)  and  (2.26),  it  can  be 
seen  that  the  variance  of  the  bispectrum  estimate  is  proportional  to 

_N_ 

LM2 

where  N  is  the  length  of  the  FFT,  L  is  the  number  of  bispectra  that  are  coherently  averaged 
in  time,  and  M  is  the  size  of  the  square  over  which  the  bispectrum  is  averaged  in  frequency. 
The  total  number  of  samples  used  to  estimate  the  bispectrum  is  NL.  The  resolution  of  the 
bispcctrum  in  frequency  is  given  by 


•  Mk 

N  ’ 

where  fs  is  the  sampling  rate.  If,  instead  of  estimating  the  bispectrum  with  FFTs  of  length 
N  and  coherently  averaging  L  bispectra,  the  bispectrum  is  estimated  with  a  single  FFT  of 
length  N’=NL  but  with  the  same  sampling  rate,  and  then,  if  this  bispectrum  is  averaged  in 
frequency  over  a  square  of  size  M'=ML,  the  variance  and  the  resolution  are  the  same  as  the 
bispectrurn  estimate  using  FFTs  of  size  N.  Thus,  averaging  in  frequency  and  averaging  in 
time  are  equivalent  if  the  appropriate  choice  of  M  is  made.  The  advantage  of  dividing  the 
time  series  into  L  segments  of  length  N  and  averaging  in  time  is  that  it  requires  significantly 
fewei  calculations  and  can  thus  be  performed  faster  and  with  less  storage  requirements  than 
computing  one  bispectrum  using  NL  samples.  The  potential  liability  of  this  approach  is 
that  if  the  resolution  of  the  FFT  is  significantly  coarser  than  the  bandwidth  of  the  nonlinear 
component,  the  presence  of  the  nonlinearity  will  be  reduced  in  the  bispectrum.  Thus  the 
best  choice  of  N,  M,  and  L  is,  as  is  usually  the  case,  to  some  extent  data  dependent. 
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As  can  be  seen  from  the  above  expressions,  decreasing  the  value  of  M  improves  the 
resolution  but  increases  the  variance  of  the  bispectrum  estimate.  This  increase  in  variance 
can  be  offset  by  coherently  averaging  the  bispectrum  over  a  longer  time,  i.e.,  increasing  L. 
However,  since  L  is  related  to  the  variance  by  L'1  whereas  M  is  related  by  M‘2,  decreasing 
M  by  a  factor  of  2,  for  example,  requires  increasing  L  by  a  factor  of  4  to  achieve  the  same 
variance.  In  practical  situations  there  are  almost  always  limits  to  how  much  L  can  be 
increased,  the  limits  being  determined  by  the  stationarity  of  the  time  series.  Thus  the 
frequency  resolution  of  the  bispectrum  that  can  be  achieved  in  practice  depends  on  the 
stationarity  of  the  time  series.  Furthermore,  the  improvement  in  resolution  that  can  be 
achieved  is  not  linear  with  the  increase  in  coherent  averaging  time,  but  instead  goes  as  the 
square  root  of  the  increase  in  averaging  time. 

2.2.2  The  Normalized  Bispectrum:  Statistical  Properties 

The  bispectrum  can  be  normalized  to  produce  a  quantity  whose  asymptotic  statistics 
can  easily  be  calculated.  The  asymptotic  distribution  of  the  estimator  given  by  Eq.  (2.23)  is 
complex  normal  and  independent  for  each  frequency  pair.  Thus  the  distribution  of  the 
normalized  bispectrum  given  by 


=  «  (2.28) 

VAR  B(m,n) 

is  asymptotically  noncentral  chi-square  with  two  degrees  of  freedom  and  noncentrality 
parameter 


where 


X(m,n) 


2 

N  M'4  Q(m,n) 


(2.29) 


Y(fgm.fgn) 


|B(fgm.ffln)l 

^(U%)P(U) 


(2.30) 


is  called  the  skewness  function.  If  the  time  series  is  Gaussian,  its  skewness  function  will 
be  zero  for  all  frequency  pairs  and  the  distribution  of  the  normalized  bispectrum  is  just  a 
central  chi-square  with  two  degrees  of  freedom  (the  noncentrality  parameter  will  be  zero). 
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This  in..  lies  that  an  a  priori  threshold  can  be  estimated  for  the  normalized  bispectrum  such 
that  values  of  the  normalized  bispectrum  that  exceed  the  threshold  can  be  said  to  be 
representative  of  a  non-Gaussian  time  series  with  a  probability  of  false  alarm  that  is  related 
to  the  threshold.  Thus  the  normalized  bispectrum  can  be  used  to  discriminate  between 
Gaussian  and  non-Gaussian  time  series  (Hinich,  1982). 

In  practice,  an  estimate  of  the  normalized  bispectrum  given  in  Eq.  (2.28)  requires 
an  estimate  of  the  variance  of  the  bispectrum  estimate,  which  in  turn  requires  an  estimate  of 
the  power  spectrum  in  the  expression  for  the  variance.  If  the  power  spectrum  is  estimated 
by 


P(j)  =  N|X(j)|2  (2.31) 

and  is  smoothed  over  a  band  of  at  least  M  adjacent  values  (or  averaged  in  time),  then  this 
estimate  of  the  normalized  bispectrum  using  the  estimated  power  spectrum  will  also  be 
asymptotically  noncentral  chi-square  distributed  with  two  degrees  of  freedom  and  the  same 
noncentrality  parameter.  If  the  bispectrum  is  coherently  averaged  over  L  segments,  then 
the  distribution  of  the  normalized  bispectrum  after  averaging  is  still  noncentral  chi-square 
with  two  degrees  of  freedom  but  with  noncentrality  parameter  LX(m,n). 

A  sliding  average  can  be  used  to  average  both  the  power  spectrum  and  bispectrum 
in  time.  The  temporal  average  of  the  power  spectrum  for  L  records  is  given  by 

pi_{j)  +  o /j]  a1**1 1  P[®i  (j) 

P{La)(i)  = - - - -  ,  (2.32) 

X  ak1 

k=*  1 


where  0  <  a  <  1  and  Pjf*  =  0.  A  similar  expression  for  the  time  averaged  bispectrum  can 
be  obtained.  Block  averaging  results  when  (a=1 ). 

If  a  sliding  average  is  applied  to  the  bispectrum,  then  its  variance  is  given  by 


17 


(2.33) 


t  (ak'1>2 

VARB[a,(m,n)  =  ^ - —VAR  B(m,n) 

(s-r 

where  VAR  B  (m,n)  is  the  variance  of  the  bispectrum  given  by  Eq.  (2.24)  (ignoring  terms 
on  the  order  of  M/N)  with  the  averaged  power  spectrum  used  as  the  estimate  of  the  power 
spectrum.  For  block  averaging  (a=l),  the  ratio  of  the  two  summations  in  Eq.  (2.33)  is  just 

[A 

From  a  detection  point  of  view,  the  issue  is  how  large  the  normalized  bispectrum 
statistic  given  by  Eq.  (2.28)  has  to  be  in  order  to  confidently  reject  the  Gaussian  noise  only 
hypothesis  and  assert  that  a  non-Gaussian  signal  is  present.  Shown  in  Fig.  2.4  is  a  plot  of 
the  probability  of  false  alarm  as  a  function  of  the  normalized  bispectrum  value  based  on  the 
central  chi-squared  distribution  function  with  two  degrees  of  freedom.  The  plot  is 
approximately  linear  on  a  log  false  alarm  scale  with  a  slope  of  4.6  per  decade.  To  operate 

'l 

at  a  false  alarm  rate  of  10  ,  one  would  reject  the  hypothesis  that  noise  only  is  present  (the 
normalized  bispectrum  statistic  is  central  chi-square  distributed  rather  than  noncentral  chi- 
square  distributed)  for  values  of  the  normalized  bispectrum  statistic  that  are  13.8  or  larger. 
Since  the  mean  value  of  the  normalized  bispectrum  statistic  is  equal  to  the  noncentrality 
parameter,  then  the  noncentrality  parameter  must  be  13.8  or  larger  to  achieve  a  probability 
of  detection  of  0.5  at  this  false  alarm  rate.  This  result  is  based  on  detection  at  a  single  point 
in  the  bispectrum.  Summing  the  bispectrum  over  its  principal  domain  can  also  be  used  to 
detect  the  presence  of  a  non-Gaussian  signal  (Hinich,  1982). 

2.3  THE  BISPECTRUM  WATERFALL 

It  is  typical  to  view  the  time  history  of  the  power  spectrum  in  a  waterfall  display 
format,  where  the  vertical  axis  is  time  and  the  horizontal  axis  is  frequency.  Because  the 
bispectrum  is  a  function  of  two  frequencies,  it  is  not  possible  to  directly  display  it  in  a 
waterfall  format.  However,  it  is  possible  to  reduce  the  two-dimensional  bispectrum  to  a 
form  that  is  amenable  to  a  waterfall  display  format.  This  is  done  by  summing  over  the 
principal  domain  all  values  of  the  normalized,  averaged  bispectrum  X2(m«n)  such  that  the 
sum  of  the  two  frequencies  (fgm,fgn)  is  constant 
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NORMALIZED  BISPECTRUM  THRESHOLD 


FIGURE  2.4 

FALSE  ALARM  PROBABILITY  AS  A  FUNCTION  OF 
NORMALIZED  BISPECTRUM  THRESHOLD 


(2.34) 


X2(0  =  X  X2(m.n) 

m,n:  m+n-i 


The  frequency  pairs  that  satisfy  this  constraint  lie  along  a  diagonal  as  shown  in  Fig.  2.5. 
This  summation  is  repeated  for  all  frequencies  (fgm,fgn)  that  are  represented  in  the  Fig. 2. 5 

bispectrum.  In  this  way  the  two-dimensional  bispectrum  has  been  collapsed  to  one 
dimension  and  can  then  be  displayed  (with  loss  of  information)  as  a  function  of  time  in  a 
waterfall  format,  much  like  a  power  spectrum  waterfall. 

Since  the  bispectral  estimate  at  each  frequency  pair  is  asymptotically  independent, 
then  under  the  assumption  of  Gaussianity  each  value  in  this  waterfall  will  asymptotically 
have  a  central  chi-square  distribution  with  21  degrees  of  freedom,  where  I  is  the  number  of 
bispectral  values  included  in  the  summation  in  Eq.  (2.34).  Since  the  mean  and  variance  of 
a  chi-square  distributed  random  variable  are  proportional  to  the  degrees  of  freedom,  values 
of  the  waterfall  at  higher  frequency  values  will  have  a  larger  mean  and  variance  than  values 
at  lower  frequency  values.  This  tends  to  produce  an  uneven  display  that  hinders  detection. 
The  summation  in  Eq.  (2.34)  can  be  scaled  by  the  number  of  terms  in  the  summation,  thus 
causing  the  mean  values  of  the  waterfall  to  be  independent  of  frequency  and  resulting  in  a 
variance  that  decreases  at  higher  frequencies.  The  summation  can  also  be  restricted  to 
include  only  those  values  that  exceed  a  specified  threshold,  so  that  the  number  of  degrees 
of  freedom  is  determined  by  the  number  of  large  bispectral  values  rather  than  by  the 
frequency.  This  latter  approach  is  illustrated  in  Fig.  2.6,  where  a  signal  with  a  bispectrum 
at  a  single  point  has  been  added  to  Gaussian  noise.  The  presence  of  the  bispectral  signal  is 
much  more  obvious  in  the  waterfall  than  in  the  full  bispectrum,  demonstrating  the 
usefulness  of  the  waterfall  presentation  for  detecting  the  presence  of  bispectral  signals  in 
noise. 
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=  SAMPLING  FREQUENCY 
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FIGURE  2.5 

BISPECTRUM  WATERFALL 
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FIGURE  2.6 

EXAMPLE  OF  BISPECTRUM  WATERFALL 


23 


AS-90-653 
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DETECTION  OF  NON-GAUSSIAN  SIGNALS  IN  NON-GAUSSIAN 

NOISE 


It  is  shown  in  Hinich  and  Wilson  (1990)  that  the  statistic 


CH(m,n) 


2LM4Q'1(m,n) 

N 


B^N\m,n)  -  Bn(m,n)j2 
Ps-t-n(^gm)Ps+n(^gr)f:>s+nOgm+gn) 


(3.1) 


where  Bn  is  the  bispectrum  of  the  noise,  Ps+n  is  the  signal-plus-noise  power  spectrum, 
and  B(N)  is  a  consistent  estimate  of  the  measured  bispectrum  such  as  given  by  Eq.  (2.23), 
is  approximately  a  central  chi-square  random  variable  with  two  degrees  of  freedom  under 
the  null  hypothesis  of  noise  only.  Under  the  alternative  hypothesis  of  signal  plus  noise  it  is 
a  noncentral  chi-square  random  variable  w'ith  noncentralitv  parameter  given  by 


3.(m,n)  = 


2LM4  Q'  (m,n) 


N(Up-1(fgm)Xl+p-1(WX1+P‘1(W) 


"ls(fgm.fgr,) 


(3.2) 


where  ys  is  the  skewness  function  of  the  signal  and  p  is  the  signal-to-noise  pow'er  ratio: 


p(f)  = 


Ml 

Pn(f) 


(3.3) 


In  Eqs.  (3. 1 )  and  (3.2)  it  is  assumed  that  a  block  average  in  time  of  L  records  is  applied.  If 
a  sliding  average  is  applied,  the  factor  L  is  replaced  by  the  appropriate  ratio  of  summations 
that  appears  in  Eq.  (2.33).  Detection  occurs  when  the  value  of  the  statistic  given  by 
Eq.  (3.1)  exceeds  a  threshold  determined  by  the  central  chi-square  distribution  (see 
Fig.  (2.5)).  Since  the  mean  value  of  Eq.  (3.1)  is  equal  to  Eq.  (3.2),  the  value  of  the 
noncentrality  parameter  given  by  Eq.  (3.2)  determines  the  detectability  of  the  signal. 

From  Eq.  (3.2)  it  can  be  seen  that  several  factors  contribute  to  the  value  of  the 
noncentralitv  parameter.  One  is  the  skewness  function  ys  which  is  a  characteristic  of  the 
signal,  one  is  the  S/N  p,  w  hich  is  a  characteristic  of  the  signal  and  noise  power  levels,  and 
the  rest  are  characteristics  of  the  processing.  Given  that  a  signal  has  a  nonzero  skewness 
function,  then  there  is  a  tradeoff  between  S/N  and  processing  parameters  that  determines  if 
the  nonzero  skewness  will  result  in  a  sufficiently  large  noncentrality  parameter  to  allow  its 
detection  at  a  given  false  alarm  rate.  For  small  values  of  the  skewness  function,  the 


processing  parameters  and  S/N  have  to  be  such  that  their  product,  given  by  Eq.  (3.2), 
remains  large  enough  to  produce  a  sufficiently  large  noncentrality  parameter  for  detection. 
The  noncentrality  parameter  has  a  linear  dependence  on  the  number  of  temporal  averages  L, 
a  quadratic  dependence  on  the  number  of  frequency  averages  M  (Q  is  equal  to  M  except 
on  the  boundaries  of  the  principal  domain),  and  an  approximately  cubic  dependence  on  S/N 
(for  low  S/N).  This  implies  that  if  the  S/N  decreases  by  a  factor  of  2  (3  dB),  then  it  is 
necessary  to  either  increase  the  number  of  temporal  averages  by  a  factor  of  8,  increase  the 
number  of  averages  in  frequency  by  a  factor  of  V8,  or  increase  both  the  frequency  and  time 
averaging  such  that  the  product  LM  increases  by  a  factor  of  8,  in  order  to  retain  the  same 
level  of  detectability. 

Equation  (3.2)  demonstrates  the  essential  relationship  between  signal 
characteristics,  noise  characteristics,  and  processing  parameters  that  determines  the 
detection  performance  of  the  bispectrum.  To  determine  the  viability  of  bispectrum 
processing  for  detection  of  non-Gaussian  signals,  it  is  essential  to  know  the  skewness 
function  of  signals  of  interest.  Given  a  skewness  function,  the  processing  parameters 
necessary  to  achieve  detection  as  a  function  of  S/N  can  then  be  determined.  It  should  also 
be  noted  that  Eq.  (3.2)  is  relevant  for  "narrowband"  detection,  i.e.,  detection  at  a  single 
point  in  the  bispectrum.  One  can  also  consider  "broadband"  detection  in  which  the 
detection  statistic  is  based  on  bispectrum  values  over  the  entire  principal  domain.  Examples 
of  the  detection  performance  as  a  function  of  these  parameters  for  the  broadband  case  can 
be  found  in  Hinich  and  Wilson  (1990),  which  is  included  in  Appendix  A  of  this  report. 
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4.  TIME  DELAY  ESTIMATION  USING  THE  CROSS-BISPECTRUM 


4.1  TIME  DELAY  ESTIMATION  BASED  ON  THE  PHASE  OF  THE 
CROSS-BISPECTRUM 

Following  the  formal  definition  of  higher  order  spectra  for  a  single  process  in 
Section  2.1.2,  the  cross-bispectrum  can  be  written  as 

Bl12(G>1.  0)2)  =  32{[X1(t+T1)  X^t+12)  x2(t)]}  ,  (4.1) 

where  X-|(t)  and  X2(t)  are  two  stationary  zero  mean  random  processes.  By  arguments 
similar  to  those  given  in  Section  2.1.3,  it  can  be  shown  that  the  principal  domain  of  the 
cross-bispectrum  is  given  by  the  region  in  Fig.  4.1(a).  The  support  set  is  given  by 
Fig.  4.1(b).  The  cross-bispectrum  can  be  estimated  in  a  manner  completely  analogous  to 
the  bispectrum  estimate  given  by  Eqs.  (2.22)  and  (2.23). 

If  Xi  (t)  is  just  a  time  delayed  version  of  X2(t),  then 

Bii2(Ti.n)  =  Bt11(m,n)  e'27rfm*nT  .  (4.2) 


Thus  B 1 1 2  is  just  a  phase  shifted  version  of  Bui: 

eii2(m,n)  =  0ii1(mln)  +  27tfm+nx  ,  (4.3) 

where  0ii2  and  0m  are  the  phases  of  the  cross-  and  auto-bispectrum,  respectively. 
Several  non-parametric  methods  for  estimating  this  time  delay  have  been  put  forth  that 
essentially  rely  upon  estimates  of  the  cross-  and  auto-bispectrum  phases.  (Nikias  and  Pan, 
1988). 


The  obvious  estimator  of  the  phase  of  the  cross-bispectrum  is  just 


(J)ii2(m,n)  =  tan  '1 


Bi'i2  (m,n) 

.Bff2(m,n)_ 


(4.4) 
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where  B^]2  and  B^2  are  just  the  imaginary  and  real  parts  of  a  consistent  estimate  of  the 
cross-bispectrum.  The  estimate  of  the  auto-bispectrum  phase  is  similar.  In  Hinich  and 
Wilson  (1989)  we  derived  the  variance  and  distribution  of  this  phase  estimator  and  showed 
that  the  exact  distribution  could  be  approximated  by  a  Gaussian  distribution  in  most  cases. 

If  the  time  delay  is  estimated  from  Eq.  (4.3)  as 


x(m,n)  = 


(Pn^m.nj-cDmjm.n) 


then  it  can  also  be  approximated  by  a  Gaussian  distribution  since  it  is  just  the  scaled 
difference  between  two  approximately  Gaussian  random  variables.  The  variance  of  the 
phase  difference  in  additive  Gaussian  noise  was  shown  in  Hinich  and  Wilson  (1989)  to  be 

2  1 

VAR{ct>112(m,n)  -  <t>m(m,n)}=  2NM  Q  (m'n)  [up~Hfm)]  [l  V^P'UWn) 

Lyf  1 1  (tm.f„) 

X  (l-Refw^W*)  e-'2^.nx]}  ,  (4.6) 

where  W(12\f)  is  the  coherency  spectrum  of  the  noise: 


W12(f)  = 


P(,2}(f) 


The  effects  of  signal  skewness,  processing  parameters,  S/N,  and  noise  coherence  on  the 
variance  of  the  time  delay  estimate  are  clearly  demonstrated  in  Eq.  (4.6).  Specific  examples 
of  the  effects  of  these  parameters  are  described  in  Hinich  and  Wilson  (1989),  which  is 
included  as  Appendix  B  of  this  report. 

4.2  AN  ALTERNATIVE  TIME  DELAY  ESTIMATOR 


e  define  an  estimator  SbO)  such  that 


^  ^  ^  • 

SbO)  =  X  Bn2(m,n)  Bn^m.n)  , 

im,n)  m+n*i 


(4-8) 
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where  B112  and  Bui  are  consistent  estimators  of  the  cross-  and  auto-bispectrum 
respectively,  then 

e{§bO)}=  Z  B1i2(fm.fn)  B*in(fm.fn)  +  NPi(fm)Pi(fn)P’i2(fm+n)  •  (4.9) 

(m,n):m+n-i 


This  equation  derives  from  the  expression  of 
E{Xi(m)Xi(n)X2(m+n)X2(m)Xi(n)X2(m+n)}  in  terms  of  its  cumulants  and  the 

relationship  between  the  cumulants  of  DFTs  and  cumulant  spectra  given  by  Brillinger 
(1975).  If  we  define 


Sr(i)  = 


§B(i)  _ 

Pl(fm)Pl(fn)Pl2(f  m+n) 


and 


rn2(m,n) 


Bii2(m,n) 

VrPr(fm)P1(fn)P2(f  m+n) 


and  similarly  for  Fin  ,  then  we  have 


E{sr(i)}  =  Sr(fj)  +  NI(i)W*12(fi) 


(4.10) 


(4.11) 


(4.12) 


where 


Sr(fi)  =  Z  ^  1 1 2(fm>fn)  Pi  1  l(fm.fn) 

(m,n):m+n=i 

=  cross-skewness  spectrum  ,  (4.13) 

and  l(i)  is  the  number  of  terms  in  the  sum  in  Eq.  (4.8).  We  can  see  from  Eq.  (4.2)  that  if 
Xi(t)  is  a  time  delayed  version  of  X2(t),  then  the  phase  of  the  cross-skewness  spectrum  is 
just  27tfm+n  t'  and  its  magnitude  is  the  sunt  of  the  square  of  the  skewness  function  over  all 
(m,n):m+n=i.  Thus  the  time  delay  can  be  estimated  directly  from  the  phase  of  the  cross- 
skewness  spectrum  and  the  magnitude  of  the  cross-skewness  spectrum  at  each  frequency  is 
related  to  the  square  of  the  skewness  function  of  the  signal. 
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The  cross-skewness  spectrum  thus  has  characteristics  similar  to  the  cross-power 
spectrum.  If  in  analogy  to  the  cross-skewness  spectrum  we  define  a  cross-skewness 
correlation  rr(T)  as  the  Fourier  transform  of  the  cross-skewness  spectrum,  then  rp(T)  will 
have  many  of  the  properties  of  a  crosscorrelation  function.  For  a  broadband  non-Gaussian 
signal,  the  cross-skewness  correlation  will  have  a  maximum  at  the  time  delay  x'  and  be  near 
zero  elsewhere.  Thus  the  cross-skewness  spectrum  and  correlation  can  serve  as  useful 
tools  for  measuring  the  time  delay  of  broadband  non-Gaussian  sources. 
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DETECTION  OF  NON-GAUSSIAN  SIGNALS  IN  NON-GAUSSIAN 
NOISE  USING  THE  BISPECTRA  (Hinich  and  Wilson  (1990)) 

(This  paper  has  been  published  in  IEEE  Trans.  Acoust,  Speech, 
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Detection  of  non-Gaussian  signals  in  non-Gaussian  noise  using  the 
bispectra 


Melvin  Hinich  and  Gary  R.  Wilson,  Applied  Research  Laboratories,  The 
University  of  Texas  at  Austin,  Austin,  Texas  78713-8029 

Bispectrum  analysis  provides  a  means  of  testing  a  stochastic  time 
series  for  non-Gauss ianity  (including  nonlinearity).  For  cases  of 
practical  interest,  the  non-Gaussian  time  series  may  be  partially 
masked  by  either  Gaussian  or  non-Gaussian  noise.  In  this  paper  we 
cast  the  problem  of  detecting  a  non-Gaussian  time  series  in  the 
presence  of  additive  Gaussian  or  non-Gaussian  noise  into  a  classical 
hypothesis  testing  framework,  using  the  sample  bispectrum  as  the  test 
statistic.  The  power  of  the  test  is  demonstrated  as  a  function  of 
signal-to-noise  ratio,  the  degree  of  skewness  of  the  signal,  and 
processing  parameters.  The  results  are  compared  to  the  power  of  a 
classical  energy  detection  test. 


INTRODUCTION 

The  bispectrum  is  becoming  a  useful  and  practical  tool  for 
non-Gaussian  time  series  analysis  and  diagnosis  in  fields  such  as 
biomedicine  (Barnett,  et  al,  1971),  fluid  mechanics  (Lii,  et  al,  1976), 
oceanography  (Hasselmann,  et  al,  1963),  plasma  physics  (Kim  and  Powers, 
1979),  geophysics  (Hinich  and  Clay,  1968),  and  economics  (Hinich  and 
Patterson,  1985).  A  recent  tutorial  on  bispectrum  estimation  has  been 
provided  by  Nikias  and  Raghuveer  (1987).  Since  most  realistic 


measurements  are  corrupted  by  inte.fering  noise,  and  in  some  cases 
dominated  by  noise,  application  of  the  bispectrum  to  experimental 
measurements  of  a  time  series  requires  an  understanding  of  the  effects  of 
interfering  noise  on  the  bispectral  measurements. 

In  this  paper  we  are  only  concerned  with  the  effects  of  noise  on  the 
ability  of  the  bispectrum  to  detect  the  presence  of  a  non-Gaussian  time 
series.  The  effects  of  noise  on  the  accuracy  of  a  bispectrum  estimate  is 
not  directly  addressed.  Using  the  bispectrum  as  a  test  for  Gaussianity  of 
a  time  series  has  been  described  by  Hinich  (1982)  for  the  signal  only 
case.  In  this  paper  the  ability  to  test  for  Gaussianity  of  a  time  series 
in  the  presence  of  independent  noise  is  demonstrated  as  a  function  of 
signal-to-noise  ratio  (SNR).  In  Section  I  the  problem  is  posed  in  a 
classical  hypothesis  testing  framework.  In  Section  II  the  bispectrum  is 
defined,  followed  in  Section  III  by  development  of  the  bispectrum  test 
statistic  to  address  the  hypothesis  test  specified  in  Section  I.  The 
power  of  this  test  for  a  specific  signal  and  noise  case  is  demonstrated  in 
Section  IV,  and  compared  to  the  power  of  a  standard  energy  detection  test 
in  Section  V. 

I .  SIGNAL  MODEL 

Suppose  that  we  observe  a  signal  (s(t)}  plus  noise  {n(t)J,  where  the 
(s(t)}  is  a  non-Gaussian  (possibly  nonlinear)  stationary  random  process 
and  {n(t)}  is  in  general  also  a  non-Gaussian  stationary  process.  Let  s(t) 
have  a  power  spectral  density  Sg(f)  and  bispectrum  Bgtfjjf^).  Similarly, 
let  the  noise  n(t)  have  a  power  spectral  density  Sn(f)  and  bispectrum 
Bn(fj,fj<).  We  will  use  a  classical  hypothesis  testing  framework  to 
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determine  the  detectability  of  this  non-Gaussian  signal  in  additive  noise 
from  a  finite  sample  of  the  input  signal. 

Let  { y ( t ) }  denote  the  input  signal  that  we  observe  for  a  finite  time 
using  an  equally  spaced  discrete-time  sampling  method  that  avoids  any 
aliasing.  We  use  the  standard  convention  that  t  is  an  integer,  which 
means  that  the  sampling  interval  is  set  equal  to  one.  Under  the  null 
hypothesis  that  there  is  no  signal  present  in  the  observed  record, 
y(t)=n(t)  for  t=l,...,N  where  N  is  the  sample  size.  If  the  signal  is 
present,  y( t)«s( t)+n( t) .  We  present  a  test  of  these  compound  hypotheses 
for  a  given  false  alarm  probability  a  under  the  following  assumptions 
about  the  signal  and  noise. 

(1)  All  the  moment  functions  exist  for  the  signal  and  noise. 

(2)  The  signal  and  noise  are  M-dependent.  That  is,  for  any 

tl'**,tn'  (s(t^) , , ,s(t  )}  and  {s(tj+m) , . . . ,s(tn  m)}  are  independent  for 

m>M,  and  similarly  for  n(t). 

This  is  a  strong  form  of  a  "short  memory"  assumption  that  is  needed  to 
obtain  the  asymptotic  Gaussian  distribution  that  we  need.  It  is,  however, 
the  most  intuitive  of  the  standard  mixing  conditions  used  to  prove 
asymptotic  results.  These  two  assumptions  guarantee  that  the  signal  and 
noise  spectrum  and  the  polyspectra  of  all  orders  exist.  We  will  use  the 
estimated  bispectrum  of  a  sample  of  the  observed  y(t)'s  to  detect  the 
non-Gaussian  signal.  Let  us  now  review  the  definition  of  a  bispectrum  and 
its  use  as  a  tool  to  study  non-Gaussian  random  signals. 

II.  THE  BI SPECTRUM 

The  bispectrum  of  a  discrete-time  signal  is  a  periodic  function  in 
two  frequency  indices.  Let  {y ( t ) }  denote  a  real  zero  mean  stationary 
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continuous-time  random  signal.  Assume  all  expected  values,  sums,  and 

integrals  used  below  hold.  The  bicovariance  function  of  the  process  is 

c(u,v)-Ey( t)y( t+u)y( t+v) ,  which  does  not  depend  on  t  since  the  process  is 

stationary.  Its  Fourier  transform 

00 

B(f,g)  -  J  c(u,v)  exp(-i2rt(  fu+gv)  ]dudv  (1) 

—  00 

is  called  the  bispectrum.  The  bispectrum's  symmetry  lines  are  f«g, 
f-h( 2f*»-g) ,  and  g=h(2g«-f).  Another  symmetry  holds  since  c(u,v)  is  real, 
namely  B(-f ,-g,-h)=B*( f ,g,h) ,  where  *  denotes  complex  conjugation.  This 
symmetry  yields  the  symmetry  line  f— g  (Fig.  1).  Thus  the  pointed  cone 
C-{f,g:  0<f,g<f}  is  a  principal  domain  of  this  continuous  time  bispectrum 
in  the  (f,g)  plane. 

Now  consider  the  discrete-time  sequence  (y(n)}  where  the  sampling 
rate  fQ  is  greater  than  1/2.  The  bicovariance  function  of  this  sampled 
version  of  { y ( t ) }  is  really  an  array  {c(j,k):  j,k«0, (+/-)1, (+/-)2, . . . } . 
Then  the  bispectrum  of  the  sampled  data  is  defined,  analogous  with  (1),  to 
be  given  by  the  Fourier  transform  in  two  indices: 

BQ(f,g)  =  c(j,k)  exp[-i2fi( f+g]  .  (2) 

The  sampling  introduces  an  infinite  set  of  parallel  symmetry  lines 
2f+g-n,  2f-g=n,  f+2g=n,  and  f-2g=n.  The  cone  C  is  only  cut  by  the 

symmetries  2f+g-n,  and  thus  the  principal  domain  of  BQ  is  the  triangle 
{ f , g :  0<f<l/2,  g<f,  2f+g<l)  in  the  cone  C.  If  the  discrete-time  sequence 
is  stationary  and  unaliased,  then  the  bispectrum  of  the  sampled  data  can 
be  non-zero  only  in  the  triangle  defined  by  {f,g:  0  <  f  <  1/2,  g  <  f, 
f+g  <  1/2}  (Hinich  and  Wolinksky,  1988).  Let  us  now  drop  the  zero 
subscript  on  B  to  simplify  the  notation. 
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III.  BISPECTRUM  TEST  STATISTICS 

Given  a  sample  of  the  signal  { y ( 1 ) , . . . ,y(N) } ,  let  B^(fj,fk)  denote 
a  consistent  estimator  of  B(fj,fk)  for  a  sample  of  size  N  of  {y(n)}.  This 
estimator  can  be  computed  directly  by  smoothing  the  sample  bicovariance 
(Rao,  1983),  smoothing  the  sample  bispectrum  in  the  bifrequency  domain 
(Hinich,  1982),  or  by  dividing  the  sample  into  pieces  and  averaging  the 
piecewise  sample  bispectra  and  then  doing  bifrequency  smoothing  (Lii  and 
Rosenblatt,  1982).  Parametric  methods  can  also  be  used  to  estimate  the 
bispectrum  (see  Nikias  and  Raghuveer,  1987). 

Let  denote  the  bandwidth  of  the  bispectrum  for  the  smoothing 

method  used.  For  example,  if  the  sample  of  length  N  is  divided  into  L 

pieces  and  the  L  piecewise  sample  bispectra  are  averaged  and  the  result  is 

then  smoothed  in  bifrequency  over  a  square  whose  sides  are  of  length  M, 
LM 

then  ^  .  Assume  (to  satisfy  consistency)  that  A^-Od/JN).  Under 

some  short  memory  restriction  which  holds  for  the  M-dependence  stated  in 
assumption  (2),  Brillinger  and  Rosenblatt  (1967)  or  Rosenblatt  (1985)  show 
that  for  large  N,  the  distribution  of 

ANN1/2[B(N)(f .,fR)  -  B(fj,fk)]/[Sy(fj)Sy(fk)Sy(fj+k))1/2  (3) 

is  approximately  complex  Gaussian  N  (0,1)  where  S  is  the  spectrum  of  the 

c  Jr 

observed  signal.  This  approximation  is  of  order  0(1/N).  Thus  if  HQ  is 
true,  <Vjl/2B(N)(fj'fk)/fSy<fj)Sy(fk)Sy(fj+k^1//2  is  approximately  a 
complex  Gaussian  variate  whose  mean  is  Bn(fj,fk)  and  variance  is  1. 

Another  equally  important  large  sample  result  that  follows  from  the 
asyi«ptotic  results  developed  by  Brillinger  and  Rosenblatt  is  that  these 
statistics  can  be  treated  as  independent  random  variables  over  the  grid  in 
the  principal  domain  if  the  grid  width  is  larger  than  or  equal  to  the 
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bandwidth,  i.e.,  B^(fj,fk)  and  B^(fc,fs)  are  independent  for  j*k  or 

k*8  if  lfj+l-fjl^  or  lfs+l"fs^V 

The  asymptotic  independence  and  Gaussianity  imply  that  the  statistic 


CH<fj'*k>  -  2^|B(N>(fj,£k)-Bta(*j.*k)|2/\<tj)Sy(fk»Sy(£j+k)  («) 

is  approximately  a  central  chi-square  variate  with  2  degrees  of  freedom  if 
Hq  is  true.  If  the  signal  is  present  and  its  bispectrum  Bg  is  nonzero, 
the  CH( f j , f ^ )  is  approximately  a  noncentral  chi-square  statistic  with 
2  degrees  of  freedom  and  noncentrality  parameter 


A(j'k)  -  2A2N|Bs(fj,fk)|2/Sy(fj)Sy(fk)Sy(fj+k)  .  (5) 

Artificial  data  analyses  indicate  that  this  approximation  holds  for 
samples  as  small  as  N=256  if  A^=l/'lN  (Ashley,  Patterson,  and  Hmich, 
1986) . 

If  in  expression  (4)  we  replace  Bn(fj,fk)  with  B^N^(fj,fk),  (assuming 
that  we  have  a  noise-only  sample  of  the  data)  and  the  spectrum  Sy  at  the 
frequencies  (f.  )  with  a  consistent  spectrum  estimator  denoted  whose 

x  y 

rms  error  is  0(1/N),  we  then  have  a  statistic  that  is  approximately 
chi-square  2  under  the  null  hypothesis.  Summing  these  approximately 
chi-square  statistics  for  all  bifrequency  pairs  in  the  principal  domain 
yields  a  statistic  whose  distribution  is  approximately  chi-square  with  2K 
degrees  of  freedom,  where  K  is  the  number  of  bifrequency  pairs  for  the 
nonzero  bispectral  values  in  the  principal  domain  (approximately  N  /16). 
For  A^I/Tn,  K  is  given  approximately  by 

K!  -L  .  <6> 

l  ^  J 

The  detection  test  is  to  reject  H  if  the  test  statistic 
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TCH  -  2A2NE(j>k)|B(N)(fj/fk)-B<lN)(fj,fk)|2/S^N)(fj)S^N)(fk)S(N)(fj+k)  (7) 
is  greater  than  a  threshold  T  where  Pr(#L>T  )-a  and  T  can  be  determined 

Ot  ZK  Ot  OC 

from  tables  of  the  distribution.  The  summation  over  the  ( j , k )  is  over 
the  support  set  of  the  bi spectrum  given  by  {j,k:  0  <  j  <  1/2,  k  <  j, 
j+k  <  1/2}  The  false  alarm  probability  is  approximately  a  since  the 
distribution  of  TCH  is  approximately  that  of  a  variate  for  large  N. 
Note  that  the  detection  test  requires  an  estimate  of  the  noise-only 
bispectrum  and  an  estimate  of  the  spectrum  of  the  data  (or  a  priori 
knowledge  of  the  two).  The  statistical  power  of  this  test  to  detect  a 
non-Gaussian  signal  is  a  function  of  the  noncentrality  parameter  A  which 
is  the  sum  over  (j,k)  in  the  principal  domain  of  the  A(j,k)  given  by  (5). 
We  discuss  the  power  of  the  test  in  the  next  section. 

IV.  POWER  OF  THE  TEST 

From  (5),  the  noncentrality  parameter  for  the  test  statistic  is 
A  -  2NA2  E(j<k)  rs(fj#fk)  /  (l+p"1(fj))  (1+P_1(fk) )  (l+p_1(fj+k))  ,  (8) 

where 

rs<£j'£k>  -  lBs<£j-fk»|2/Ss(fj)Ss<fk)Ss(j+k)  (9) 

and  p(f )-Ss(f )/Sn(f )  is  the  signal-to-noise  power  ratio  at  the  frequency 
f.  rs^fj'fk^  *s  caHed  the  skewness  function  of  the  signal.  Let  us 
define  the  weighted  average  skewness  of  the  signal  as  rg,  where 

rs  •  4  £(j,k)  rs<£j’£k)u(£j)w<fk>"<fj+k>  '  <10) 

where  w(f )=( p(f )+p(f )p-1)/(  p(f )+l)  and  p  is  an  average  signal-to-noise 

ratio  (SNR)  across  all  the  frequencies.  The  function  w(f)  weights  the 

skewness  function  of  the  signal  by  the  SNR  at  each  of  the  three 


relative  to  the  average  SNR.  If  p(f)-p,  then 


frequencies  fy  fk,  and  fj+k 
w(f )-l.  Thus  the  noncentrality  parameter  is 


It  is  clear  from  (11)  that  A  is  linear  in  r.  and  approximately  cubic  in  p 
(for  low  SNR).  It  is  also  clear  that  the  power  of  the  test  is  dependent 
on  the  skewness  of  the  signal  but  independent  of  the  noise  skewness, 
assuming  that  it  is  known  or  can  be  estimated. 

The  probability  of  detection  is  the  probability  that  the  detection 
statistic  given  by  (7)  will  exceed  the  threshold  T  ,  where  the  detection 
statistic  under  the  alternative  hypothesis  is  noncentral  chi-square  with 
2K  degrees  of  freedom  and  noncentrality  parameter  given  by  (11).  We  will 
now  examine  the  probability  of  detection  as  a  function  of  average  SNR  p, 
weighted  average  skewness  rg,  and  processing  bandwidth  Z^.  In  all  cases, 
the  probability  of  false  alarm  a  was  set  to  10-3. 

Shown  in  Fig.  2  is  a  plot  of  the  probability  of  detection  as  a 
function  of  average  SNR  for  several  values  of  the  weighted  average 
skewness  rg.  For  these  cases,  N=104,  ^=0.01,  and  K-625.  Since  the 
noncentrality  parameter  given  by  Eq.  11  has  a  linear  dependence  on  r  and 
approximately  a  cubic  dependence  on  p,  at  low  average  SNR  it  is  necessary 
for  rs  to  increase  by  a  factor  of  8  to  achieve  a  3  dB  improvement  in 
detection  performance.  This  behavior  can  be  observed  in  Fig.  2.  Also 
because  of  the  cubic  dependence  on  p,  the  detection  curves  exhibit  a  rapid 
increase  in  probability  of  detection  from  near  0  probability  to  near  a 
probability  of  1  over  a  small  SNR  range  of  only  4-5  dB. 

A  slightly  different  display  of  the  relationship  between  the  weighted 


42 


average  skewness  and  the  SNR  is  shown  in  Fig.  3.  In  this  figure,  the 
value  of  the  weighted  average  skewness  needed  to  achieve  a  probability  of 
detection  of  0.5  is  plotted  as  a  function  of  average  SNR  for  the  same 
values  of  N,  z^j,  and  K  as  given  above.  The  average  SNR  needed  to  achieve 
a  probability  of  detection  of  0.5  is  often  referred  to  as  the  minimum 
detectable  level  (MDL).  At  low  average  SNR  the  cubic  dependence  can  be 
observed  in  the  slope  of  the  curve. 

The  noncentrality  parameter  also  has  a  linear  dependence  on  N,  the 
sample  size.  Thus  it  is  necessary  to  increase  the  number  of  samples  by  a 
factor  of  8  for  a  fixed  value  of  rg  to  achieve  a  3  dfi  improvement  in 
detection  performance  at  low  average  SNR.  Figure  4  shows  a  plot  of  N  as  a 
function  of  average  SNR  for  a  probability  of  detection  of  0.5.  For  this 
case,  A^O.Ol,  K=625,  and  rs«8. 

It  can  be  seen  from  the  previous  figures  that  the  bispectrum  detector 
can  detect  non-Gauss ian  signals  at  low  average  SNR  for  reasonable  sample 
sizes  and  reasonable  (in  our  experience)  values  of  the  weighted  average 
skewness.  Thus  it  appears  that  the  bispectrum  can  be  used  to  detect  a 
non-Gaussian  signal  even  in  the  presence  of  quite  low  signal-to-noise 
ratios.  We  will  now  compare  this  performance  to  an  energy  detector. 

V.  COMPARISON  TO  ENERGY  DETECTION 

If  we  wish  to  detect  the  presence  of  the  signal  without  using  its 
non-Gaussian  property,  a  reasonable  choice  for  the  detector  would  be  an 
energy  detector.  Energy  detection  relies  on  the  power  of  the  signal  only 
and  does  not  attempt  to  make  use  of  any  possible  non-Gaussian  properties 
of  the  signal.  Thus  comparing  the  performance  of  the  bispectrum  detector 
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to  the  energy  detector  will  serve  to  illustrate  the  effect  on  detection 

performance  of  utilizing  the  non-Gaussian  property  of  the  signal. 

The  noise  will  be  assumed  to  be  Gaussian  with  zero  mean  and  variance 

The  signal  is  also  assumed  to  have  zero  mean  and  variance  pa^,  where 

p  is  the  average  SNR,  as  before.  Since  the  energy  detector  does  not  make 

use  of  the  non-Gaussianity  of  the  signal,  the  signal  will  also  be  assumed 

to  be  Gaussian.  Thus  the  signal  plus  noise  is  zero  mean  Gaussian  with 
2 

variance  (l+p)aN.  For  the  observed  time  series  y(t),  t-l,2,...,N,  the 

test  statistic  for  the  energy  detector  is 

N  9  0 

TE  “  E  /(t)/*  •  (12) 

The  energy  detector  requires  a  knowledge  of  the  variance  of  the  noise 
process.  If  the  variance  of  the  noise  is  not  known  a  priori,  it  can  be 
replaced  by  its  sample  estimate,  assuming  a  sample  of  the  noise  only 
process  is  available.  Under  the  null  hypothesis  of  noise  only,  TE  is  a 
central  chi-square  variate  with  N  degrees  of  freedom.  Under  the 
alternative  hypothesis  of  signal  plus  noise,  TE/(l+p)  is  central 
chi-square  N.  The  detection  test  is  to  reject  the  null  hypothesis  if  the 
test  statistic  TE  is  greater  than  a  threshold  Ta,  where  Pr ( x^>Ta)=a,  and  a 
is  the  probability  of  false  alarm.  The  probability  of  detection  is  the 
probability  that  TE/(l+p)  will  exceed  T  . 

Figure  5  shows  the  probability  of  detection  as  a  function  of  SNR  for 
the  energy  detector  for  o£»10~^  and  several  values  of  the  sample  size  N. 
Because  of  the  linear  dependence  on  SNR,  in  contrast  to  the  cubic 
dependence  of  the  bispectrum  detector,  the  transition  region  from  low  to 
high  probability  of  detection  is  more  gradual.  For  a  probability  of 
detection  of  0.5,  Fig.  6  contains  a  plot  showing  the  sample  size  required 
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to  achieve  detection  at  a  specified  SNR.  For  low  SNR,  the  sample  size  is 
required  to  increase  by  a  factor  of  4  to  achieve  detection  at  a  3  dB  lower 
SNR. 

A  comparison  of  the  detection  performance  of  the  bispectrum  detector 
and  the  energy  detector  can  be  made  from  Fig.  7.  The  SNR  required  to 
achieve  a  probability  of  detection  of  0.5  at  a  false  alarm  rate  of  10-3 
for  a  specified  sample  size  for  the  energy  detector  and  the  bispectrum 
detector  is  shown  in  this  figure.  The  detection  performance  of  the 
bispectrum  detector  depends  not  only  on  the  sample  size  and  the  SNR,  as 
the  energy  detector  does,  but  also  on  the  value  of  the  weighted  average 
skewness.  Detection  performance  curves  are  given  for  the  bispectrum 
detector  for  two  values  of  the  weighted  average  skewness  in  Fig.  7. 
Obviously,  the  larger  the  weighted  average  skewness,  the  better  the 
bispectrum  detector  will  perform.  This  example  just  demonstrates  that  if 
the  signal  is  sufficiently  non-Gaussian,  the  bispectrum  detector  can 
detect  it  at  a  lower  SNR  than  the  energy  detector  could  detect  a  Gaussian 
signal  with  the  same  sample  size.  If  the  energy  detector  were  presented 
with  a  non-Gaussian  signal  or  non-Gaussian  noise,  its  detection 
performance  is  likely  to  be  even  worse  than  the  optimum  results  presented 
here  (Machell  and  Penrod,  1989).  On  the  other  hand,  the  bispectrum  will 
completely  fail  to  detect  a  Gaussian  signal. 

VI.  SUMMARY  AND  CONCLUSIONS 

Because  bispectrum  processing  is  becoming  a  more  widely  used  tool  in 
time  series  analysis,  it  is  useful  to  understand  the  behavior  of  the 
bispectrum  in  the  presence  of  practical  measurement  problems  such  as 
interfering  noise.  In  this  paper  we  have  focused  on  the  ability  of  the 
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bi^pectrum  to  detect  a  non-Gaussian  time  series  when  that  time  series  is 
corrupted  by  non-Gaussian  (or  Gaussian)  noise.  The  dependence  on  weighted 
average  skewness,  sample  size,  and  average  SNR  have  been  explicitly 
identified  and  demonstrated.  It  was  shown  that  for  reasonable  values  of 
weighted  average  skewness  and  sample  size,  the  bispectrum  can  detect 
non-Gaussian  signals  at  quite  low  average  SNR.  It  was  also  shown  that  for 
reasonable  values  of  skewness  and  sample  size,  the  bispectrum  will  detect 
a  non-Gaussian  signal  at  lower  SNR  than  energy  detection  will  detect  a 
Gaussian  signal.  Thus  it  is  concluded  that  the  bispectrum  can  be  used 
effectively  to  detect  non-Gaussian  signals  in  the  presence  of  interfering 
noise,  and  may  perform  better,  depending  on  the  degree  of  non-Gaussianity, 
than  energy  detection.  The  results  presented  in  this  paper  can  be  used  to 
determine  under  what  conditions  of  weighted  average  skewness,  sample  size, 
and  average  SNR  the  bispectrum  can  be  used  to  detect  a  non-Gaussian 
signal. 
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LIST  OF  FIGURES 


Figure  1  -  Symmetry  lines  and  principal  domain  of  the  continuous  time 
bispectrum. 

Figure  2  -  Probability  of  detection  as  a  function  of  SNR  for  the 
bispectrum  detector. 

Figure  3  -  Weighted  average  skewness  required  to  achieve  a  specified 
minimum  detectable  level  for  the  bispectrum  detector. 

Figure  4  -  Sample  size  required  to  achieve  a  specified  minimum  detectable 
level  for  the  bispectrum  detector. 

Figure  5  -  Probability  of  detection  as  a  function  of  SNR  for  the  energy 
detector. 

Figure  6  -  Sample  size  required  to  achieve  a  specified  minimum  detectable 
level  for  the  energy  detector. 

Figure  7  -  Comparison  of  the  minimum  detectable  level  achieved  by  the 
bispectrum  detector  and  the  energy  detector. 
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APPENDIX  B 

TIME  DELAY  ESTIMATION  LSI  NO  THE  CROSS-BISPECTRLM 

(Hinich  and  Wilson  (1989)) 

(This  paper  has  been  accepted  for  publication  in 
IEEE  Trans.  Acoust.,  Speech,  and  Signal  Proc.) 
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Time  delay  estimation  using  the  cross-bispectrum 

Melvin  J.  Hinich  and  Gary  R.  Wilson,  Applied  Research  Laboratories,  The 
University  of  Texas  at  Austin,  Austin,  Texas  78713-8029 


The  cross  bispectrum  phase  can  be  effectively  used  to  estimate  the  time 
required  for  a  non-Gaussian  signal  to  propagate  between  a  pair  of 
spatially  separated  sensors  in  the  presence  of  highly  correlated 
Gaussian  noise.  In  this  paper  we  present  a  consistent  estimator  of  the 
phase  of  the  cross  bispectrum,  derive  the  exact  distribution  of  the  phase 
of  a  complex  Gaussian  sample  bispectrum,  and  show  that  in  most  cases 
the  exact  distribution  can  be  approximated  by  a  Gaussian  distribution. 
Using  this  Gaussian  approximation,  we  derive  the  variance  of  the  time 
delay  estimate  computed  from  the  sample  cross  bispectrum  of  a  signal  in 
additive  correlated  noise.  This  results  allows  the  performance  of  time 
delay  estimators  based  on  the  cross  bispectrum  phase  to  be  quantified 
as  a  function  of  the  sample  size,  the  skewness  of  the  signal,  the  signal-to- 
noise  ratio  (SNR),  and  the  noise  correlation. 


INTRODUCTION 

The  estimation  of  the  time  required  for  a  signal  to  propagate  between  two 
spatially  separated  receivers  is  a  fundamental  approach  for  measuring  the 
direction  of  arrival  of  the  signal  relative  to  the  axis  of  the  sensors.  Estimating 
direction  of  arrival  of  a  propagating  signal  is  a  standard  signal  processing  task 
in  geophysics,  acoustics,  and  astronomy,  as  well  as  in  radar  and  sonar  systems. 

The  most  widely  used  approach  to  estimating  time  delay  is  to  find  the 
peak  of  the  sample  crosscorrelation  function  of  the  outputs  of  the  two  sensors 
(Hamon  and  Hannan,  1974,  and  Knapp  and  Carter,  1976).  This  approach 
works  well  if  the  signal  is  highly  correlated  and  the  noise  is  uncorrelated.  In 
cases  where  the  noise  is  correlated,  ambiguous  results  are  often  produced  from 
this  crosscorrelation  approach. 
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If  the  signal  is  a  stationary  NON-GAUSSIAN  time  series,  then  higher 
order  spectra  can  be  used  to  unambiguously  estimate  the  time  delay  even  in 
the  presence  of  correlated  noise.  Several  methods  for  estimating  the  time  delay 
using  the  cross  bispectrum  have  been  presented  by  Nikias  and  Pan  (1988). 
The  non-parametric  methods  essentially  rely  upon  an  estimate  of  the  phase  of 
the  sample  cross  bispectrum  computed  from  the  sensor  outputs.  The  statistical 
properties  of  such  estimators  have  to  be  derived. 

In  this  paper  we  present  a  consistent  estimator  of  the  phase  of  the  cross 
bispectrum,  derive  the  exact  distribution  of  the  phase  of  a  complex  Gaussian 
sample  bispectrum,  and  show  that  in  most  cases  the  exact  distribution  can  be 
approximated  by  a  Gaussian  distribution.  Using  this  Gaussian  approximation, 
we  derive  the  variance  of  the  time  delay  estimate  computed  from  the  sample 
cross  bispectrum  of  a  signal  in  additive  correlated  noise.  This  results  allows  the 
performance  of  time  delay  estimators  based  on  the  cross  bispectrum  phase  to 
be  quantified  as  a  function  of  the  sample  size,  the  skewness  of  the  signal,  the 
signal-to-noise  ratio  (SNR),  and  the  noise  correlation. 

This  paper  is  organized  as  follows.  Section  I  provides  an  introduction  to 
the  cross-bispectrum,  followed  by  a  presentation  of  a  statistically  consistent 
estimator  of  the  cross-bispectrum  phase  in  Section  II.  Section  III  gives  the 
derivation  of  the  distribution  of  the  phase  estimator,  and  Section  IV  compares 
this  distribution  with  an  approximate  distribution.  Sections  V  and  VI  apply  these 
results  to  the  specific  application  of  time  delay  estimation  of  a  non-Gaussian 
signal,  and  Section  VII  includes  the  effects  of  additive  Gaussian  noise.  Section 
VIII  compares  the  cross-bispectrum  time  delay  estimation  to  time  delay 
estimation  based  on  the  cross-power  spectrum,  and  Section  IX  summarizes  the 
results  of  the  paper. 

I.  THE  CROSS-BISPECTRUM 

Let  {x}  denote  a  two  dimensional  zero  mean  vector  random  process  in 
continuous  time  whose  components  are  denoted  (x-| }  and  (x2).  Assume  that 
(1)  (x)  is  strictly  stationary,  and  (2)  the  joint  density  of  [x(t  j),...x(tn)]  for  any  finite 
sequence  (t-j . tn)  has  bounded  moments,  and  (3)  the  process  has  finite 
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memory.  By  finite  memory  we  mean  that  there  exists  a  time  shift  D  such  that  for 
any  (ti  ,...tn),  [x(ti),...,x(tn)]  and  [x(ti+d),...,x(tn+d)]  are  independent  for  ail  d  >  D. 

These  three  assumptions  are  more  stringent  than  those  used  by 
Brillinger  (1975)  and  Rosenblatt  (1985)  to  prove  the  asymptotic  properties  of 
sample  bispectra  of  vector  processes  that  we  use  in  this  paper.  The  more 
restrictive  assumptions  that  we  employ  are  easier  to  understand  than  the  more 
general  conditions  and  other  restrictions  on  the  joint  distributions  of  the  process 
which  Brillinger  and  Rosenblatt  use. 

Suppose  that  the  two  processes  are  simultaneously  observed  and 
sampled  using  a  standard  method  for  obtaining  unaliased  discrete-time 
sampled  data.  The  processes  are  filtered  by  a  bandpass  filter  whose  effective 
cutoff  frequency  is  denoted  fg,  and  the  series  are  sampled  at  the  sampling  rate 
2f0.  Let  us  simplify  notation  by  setting  the  time  interval  to  the  sampling  interval 
1/2f0  and  by  using  n  rather  than  tn=n/2f0  as  the  nth  time  point. 

The  1 .1 .2  cross-bispectrum  Bi  i2(tg)  is  defined  as  follows. 


Bii2(f.g)=  X  X  c1i2(r,s)exp[-i27i(fr+gs)]  ,  (1) 

f  as  — oo  S  **  ~oa 

where 

Cn2(r,s)  =  E[x1(n+r)x1(n+s)x2(n)]  (2) 

is  the  cross-bicovariance.  The  following  inverse  Fourier  integral  transform  of 
the  cross-bispectrum  yields  the  relationship 

c112(r,s)=f  B112(f,g)  exp(i2jt(fr+gs)  ,  (3) 

JQ 

where  Q  is  the  square  {f,g:  -1/2  <  f  <  1/2,  —1/2  <  g  <  1/2). 

The  symmetry  lines  of  this  cross-bispectrum  are  easily  derived  from  the 
Cramer  representation 
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xk(n)  =  exp(i2^fn)dAk(f)  fork=1,2 


(4) 


where  E[dAi  (f)dAi  (g)dA2(h)]  =  Bi  i  2(f.9)dfdg  if  h=— f— g,  and  zero  otherwise. 
Clearly  there  is  only  one  symmetry  line,  namely  f=g.  Since  the  processes  are 
real,  there  is  one  conjugate  symmetry  line  in  the  (f,g)  plane,  the  line  defined  by 
f+g=0(h=0).  It  thus  follows  that  the  principal  domain  PD  of  the  1,1,2  cross¬ 
bispectrum  is  the  union  of  two  triangles  T1={f,g:  0  <  f  <  1/2,  0  <  g  <  f}  and 
T2={f,g:  0  <  f  <  1/2,  — f  <  g  <  0}  (see  Fig.  1). 

The  support  set  for  the  amplitude  of  the  cross-bispectrum  is  a 
proper  subset  of  the  PD.  B112  1S  identically  zero  in  the  triangle 
OT={f,g:  0  <  f  <  1/2,  1/2-f  <  g  <  f}  by  the  same  argument  as  in  Himch  and 
Wolmsky  (1988)  for  bandlimited  signals. 

II.  ESTIMATING  THE  PHASE  OF  THE  CROSS-BISPECTRUM 

In  this  section  we  discuss  a  simple  approach  to  obtaining  an 
asymptotically  unbiased  and  Gaussian  estimator  of  the  real  and  imaginary  parts 
of  Bi  1 2-  These  estimators  are  then  transformed  to  yield  an  asymptotically 
unbiased  and  Gaussian  estimator  of  the  phase  function  o(f,g),  where  in  polar 
form 


Bii2(f.g)  =  |Bii2(f.g)|  exp[ip(f,g)]  .  (5) 

Let  N  denote  the  sample  size  for  the  observation  of  x(n).  Select  a  block 
length  L  which  is  approximately  YN.  The  number  of  whole  blocks  in  the 
record  of  length  N  is  then  [N/L],  which  is  also  approximately  fN.  Let  Xkp(fm) 
denote  the  fm=m/L  term  of  the  discrete  Fourier  transform  of  the  pth  piece 

{xk(1  +L*(p— 1 )) . xk(L+L*(p-1 ))}  of  data  (k=  1 ,2) .  The  "raw"  1,1,2  cross- 

bispectral  estimator  is  the  set  products 

Fp(m.n)  =  L-^Hm^t— fn)X2<Wn)  (6) 
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The  expected  value  of  Fp(m,n)  is  approximately  B-j  I2(fm.fn)  with  an  error 
of  order  o(1/L).  Denoting  the  spectrum  of  (xk(n)}  by  Sk(f),  the  large  sample 
variance  of  the  real  and  imaginary  pahs  of  (2/L)1/2Fp(m,n)  js 

[l+5(m-n)]Si(fm)Si(fn)S2(f  rrnn)  +  o(l-')  ,  (7) 

where  8(0)=1  and  5(k)=0  for  nonzero  k.  These  results  follow  from  the  asymptotic 
theory  presented  in  Chapter  4  *  Brillinger  (1975).  Moreover,  the  real  and 
imaginary  parts  are  Gaussian  and  asymptotically  independent  as  L  -> «. 

The  estimator  Si  12  of  B112  which  we  use  is  the  average  of  these  Fp 
values  over  the  [N/L]  non-overlapping  pieces.  The  resolution  bandwidth  of  this 
estimator  is  An=1/L.  If  L  is  greater  than  the  finite  memory  length  D,  then  the 
variances  of  ReSn2(fm.fn)  and  lmSn2(m,n),  the  real  and  imaginary  parts  of 
Sii2(fm-fn),  respectively,  are  equal  to  1  /[N/L]  times  the  large  sample  variance  of 
the  real  and  imaginary  parts  of  Fp(m,n),  which  is  proportional  to  the  expression 
in  Eq.  7.  In  other  words,  the  large  sample  variances  of  the  real  and  imaginary 
parts  are  equal  to  the  product  of  the  three  spectra  multiplied  by  o2=(2NAn2)-1  . 
If  we  assume  that  L=o(VN),  then  the  bandwidth  An  and  the  large  sample 
variance  go  to  zero  as  N  00.  Moreover,  the  real  and  imaginary  parts  of  Si  12 
are  asymptotically  Gaussian  and  independent.  We  will  assume  that  N  and  L 
are  sufficiently  large  and  the  processes  are  sufficiently  well  behaved,  that  we 
may  apply  these  asymptotic  results  as  large  sample  approximations. 

There  is  one  more  large  sample  approximation  that  plays  an  essential 
part  in  the  application  of  statistical  polyspectra  theory.  Returning  to  the  raw 
cross  spectral  values,  Fp(fm,fn)  and  Fp(fr.fs)  are  approximately  INDEPENDENT 
for  large  L.  This  implies  that  the  same  holds  for  the  Si  1 2  at  different 
bifrequencies. 

Given  these  estimators  of  Si  1 2.  the  obvious  estimator  of  the  phase 
function  at  the  Fourier  frequencies  (fm.fn)  is 


O  (fm.fn)  =  tan 


S|(m,n)/SR(m,n) 


_  D.  <  0(fm,fn)  <  ^ 


n 


(8) 


where  we  now  use  the  convenient  abbreviations  SR(m,n)  and  S|(m,n)  for  the 
real  and  imaginary  parts,  respectively,  of  the  ESTIMATOR  of  the  1,1,2  cross¬ 
bispectrum.  We  will  use  Br  and  B|  as  abbreviations  of  the  real  and  imaginary 
parts  of  the  TRUE  cross-bispectrum. 


III.  THE  DISTRIBUTION  OF  THE  PHASE  ESTIMATOR 


For  each  (m,n)  pair,  SR(m,n)  and  S|(m,n)  are  asymptotically  Gaussian 
and  independent  with  expected  values  BR(m,n)  and  B|(m,n)  respectively.  We 
will  now  derive  the  exact  distribution  of  the  phase  estimator  when  they  are 
exactly  Gaussian.  It  can  be  shown  (Thomas,  1969)  that  the  probability  density 
function  of  0(fm.fn)  given  by  Eq.  8  is 


4 


4 


4 


4 


4 


f(O)  =  1  e  2c2 
11 


r  Tm?  sin  2(o-6) 

+  cos(o-b)  e  2a2  erf 

i2llo 


,0) 


where  the  dependence  on  fm  and  fn  has  been  dropped  for  compactness  of 
notation.  The  function  r-|i2(fm,fn)  is  called  the  cross-skewness  function  and  is 
given  by 


T 1 1 20m. fn)  — 


|Bii2(fm.fn) 


m+nl 


(10) 


4 


The  parameter  6  is  just  the  true  phase  of  the  cross-bispectrum  and  is  given  by 


e(fm,fn)  =  tan  1 


Bi(fm,fnn 

BR(Wn)J 


(11) 


< 


It  can  be  seen  from  Eq.  9  that  if  Tii2  is  zero,  th  phase  estimator  is 
uniformly  distributed,  as  expected.  On  the  other  hand,  if  fn2  is  large,  then  the 
second  term  in  the  sum  in  Eq.  9  dominates  and  the  phase  estimator  has  an 
approximately  Gaussian  distribution  for  O  near  0. 


66 


The  variance  of  the  phase  estimator  is  given  by  the  following.  Center  O 
by  replacing  0-0  in  Eq.  9  with  O.  Then, 


E 

2 

e{o2}  =  J  o2f(o)do 

_  K 
2 


-In; 

=  id  e  2a2  +  2 

12  K 


-rf,»  -  s'frTif } 

e  2^  ^  2o2  L 

H  (2j-1 )!! 


2 

I 


2 

O  cos2JO  dO 


(12) 


where  the  series  expansion  for  the  error  function  has  been  used  to  rewrite  the 
density  function  given  by  Eq.  9  in  the  equivalent  form: 


■In? 


f(o)  =  1  e  2</ 


n 


(r2 

2>  1  112 
V2  a2) 
(2j-1 )»! 


cos2i  O 


(13) 


By  repeated  substitution,  the  remaining  integral  can  be  evaluated  as 


r11 

' 

Jo 


2  O^  cos2)  O  dO  = 


(2j-l)l! 

(2j)H 


12  2 


k=1  k^ 


(14) 


The  expression  (2j— 1  )!!=1  -3-5-  (2j— 1 )  and  (2j)!!=2-4-6  -  (2j).  Note  that  (2j) ! !=2-*(j !). 
Substituting  this  equation  into  Eq.  12  and  making  use  of  the  series  expansion 
for  the  exponential  gives  for  the  variance 


(15) 


This  is  the  exact  expression  for  the  variance  of  the  phase  estimator  using  the 

asymptotic  distribution  of  the  cross-bispectrum.  For  I'i  12=0  the  variance  is 

2 

71  ,  as  it  should  be  for  a  uniform  distribution.  Since 
12 


for  all  j,  then 


which  implies  that  E{o2}  >  0,  as  it  should. 


The  variance  of  the  phase  estimator  under  the  Gaussian  approximation 
can  be  determined  by  the  following. 

Let  eR  and  e|  denote  the  random  error  in  Sr  and  S|,  i.e. , 


SR(m,n)  =  BR(m,n)  +  eR(m,n)  ,  S|(m,n)  =  B|(m,n)  +  ei(m,n)  .  (16) 

Expanding  Eq.  8,  we  have 

tan‘1(S|/SR)-tan'1(B|/BR)=  (l+Bi^/BiD’CBR^erBR^BieR)  +  o(o) 

=  (B^+Bf^BRei-BieR)  +  o(o)  , 

where  o2=(2NA2n)'1  is  the  variance  of  eR  and  e|.  Thus  the  large  sample 
variance  of  O(fm.fn)  for  m*n  is 


Var  <D(fm,fn )  =  o2rfi2(Wn)  •  (18) 

When  m=n,  the  large  sample  variance  is  2o2T1 12(fm.1n)- 

The  important  feature  to  note  is  that  the  statistics  of  the  phase  estimator 
are  determined  by  the  cross-skewness  function  and  the  sample  size,  and,  in 
particular,  the  variance  of  the  phase  estimator  is  inversely  proportional  to  the 
square  of  the  cross-skewness  function. 


68 


The  distribution  function  of  the  phase  estimator  is  given  by 


0 

F<j>(<p)  =J  f(o)  d<h 

_  ZL 
2 


(19) 


2c f 


cos2i<l>  dO 


1  0 
=  — -  +  —  + 


2  7C 


sing  £ 

i=i 


(1I12)  y  2j-i 
coso  + 

2j(2j-1 )!!  L 


I 


k=1 


(2j-l)(2j-3)--(2j-2k+1 

2k(j-0(j-2)--(j-k) 


)  2 

cosp 


IV.  COMPARISON  OF  THE  EXACT  AND  mPPROXIMATE  DISTRIBUTION  OF 
THE  PHASE  ESTIMATOR 


A  Q-Q  plot  of  the  exact  distribution  of  the  phase  estimator  given  by  Eq.  19 
and  its  Gaussian  approximation  w.th  variance  (Tii2/o)'2  is  shown  in  Fig.  2  for 
each  of  three  different  values  of  the  ratio  Hi  2 /a.  For  Tn  2/0=1,  the  exact 
distribution  has  smaller  tails  than  the  Gaussian  distribution,  but  as  Tii2/o 
increases  the  tails  become  larger  than  the  Gaussian  distribution  before 
converging  to  the  normal  distribution.  A  relatively  small  value  of  l'112/o  results 
in  good  agreement  between  the  exact  and  approximate  distributions. 


Figure  3  shows  a  plot  of  the  percent  difference  between  the  exact  and 
approximate  distributions  as  a  function  of  r-|i2/c?  for  three  values  of  the 
distribution  near  the  tail.  Again,  good  agreement  is  achieved  at  relatively  small 
values  of  fi  1 2/0. 
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The  percent  difference  between  the  exact  variance  given  by  Eq.  15  and 
approximate  variance  given  by  (Ti  1 2/o)‘^  is  shown  in  Fig.  4  as  a  function  of  the 
ratio  fi  i2/o-  For  small  values  of  the  ratio,  the  percent  difference  is  positive, 
indicating  that  the  exact  distribution  has  smaller  tails,  as  was  shown  in  Fig.  2. 
As  the  ratio  gets  larger,  the  tails  become  larger  than  the  normal  and  the  percent 
difference  becomes  negative. 

V.  LINEAR  RELATIONSHIP  BETWEEN  THE  SERIES 

Suppose  that  {x2(n)}  is  the  output  of  a  linear  filter  applied  to  {xi  (n)}  with 
impulse  response  {h(n)},  i.e.,  X2(n)=lrnh(m)xi  (n-m).  We  only  assume  that  the 
filter  is  stable.  Then  dA2(f)=H(f)dAi  (f)  where  H(f)  is  the  filter's  transfer  function. 
Thus  S2(f)=|H(f)|2Si(f)  and 

Bi12(f,g)dfdg  =  E[dA1(f)dA1(g)dA1(-f-g)] 

=  H(-f-g)B111(f,g)dfdg  , 

where  Bi  1 1  is  the  bispectrum  of  {xi(n)}.  Because  of  the  linear  relationship 
between  Xi  and  X2,  the  regions  T1  and  T2  in  the  cross-bispectrum  (see  Fig.  1) 
contain  the  same  information.  In  the  T2  region,  Eq.  20  becomes 

Bii2(f.-g)  =  H(g-f))B*111(f-g.g)  . 


Thus  it  is  only  necessary  to  use  the  T1  region  to  estimate  time  delay.  Applying 
Eq.  20  and  the  above  result  for  the  spectrum  of  x2  to  Eq.  10,  rii2=rm  where 
Hu  is  the  skewness  function  of  xi.  In  other  words,  the  cross-skewness  is  the 
skewness  of  each  process  if  the  two  are  linearly  related.  Thus  from  Eq.  18,  the 
large  sample  variance  of  the  phase  is 


Var  0(fm,fn)  -  o2riii(fm,fn) 


(21) 


VI.  TIME  DELAY  ESTIMATION 

Suppose  further  that  the  linear  relationship  is  given  by 
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m  *  t 


i.e. ,  x  i  is  just  a  time  delayed  version  of  x2-  Then, 

Bn2(fm,fn)  =  Bin(frr,f,)e^-)t  •  (23) 

Thus  B 1 1 2  is  just  a  phase  shifted  version  of  B  ,  1 1 : 

1?(^m.^n)  =  t)l  1  l(ini.*n)  +  27r(fm+n)'C  .  (24) 

The  fi  ;iu  delay  can  thus  be  estimated  as  the  scaled  difference  between  the 
phcoe  estimates  of  the  ctoss  and  auto  bispectra: 

\  ‘h  ,f:-f  ‘IhuUm.fn)  (25) 

•-'Jdmi  n 

In  practice,  this  time  delay  est-mah?  can  be  improved  by  averaging  over 
frequency  >  that  have  large  bispectral  value.:  or  other  equivalent  averaging 
methods  (iVkias  and  Pan,  I960) 


Since  the  bispectra  phases  are  asymptotically  Gaussian  for  large 
skewness  values,  the  time  delay  estimate  is  also  asymptotically  Gaussian. 
Using  the  Taylor  series  expansion  for  the  bispectrum  phase,  (Eq.  17)  gives  for 
the  phase  differences 

cjj^f  f  )  _  Qd(f  f  )  _  zlL  ^H2(^m.^n)  6ii2^mJn)  ~  Byi2(fm.fn)  6l12(^m.^n) 

21  |B112(fm.fn)N 

1  1  /A  /A  \ 


Bl  1 1  (fmdn)  6^  i  idm»fn)  ~  Bn  i(fm.fn)  6l  ll(^mJn) 
|  B  1  1  1  (fm,fn)|2 
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where 


^dOm.fn)  -  *^1 120m. ^n)  ^1 1 1  (fm.W  . 


6d0m.fn)  -  6l120ri.fn)  —  QlllOm.fn)  •  (28) 

The  asymptotic  variance  of  <l><j  can  now  be  calculated  from  Eq.  26  by  making 
use  of  the  following  relationships. 

E*|jeii'|(fm.fn)|?j*  =  VAR{  S 1 1 i(fm,fn)}  =  2o2Si(fm)Si(fn)Si(fm+n)  +  i  (29) 


E{|e112(fm,fn)|2}  =VAR{S111(fm.fn)} 

=  2a2S1(fm)Si(fn)S2(f  m+n)  + 

E{  6 1 1 1  (^m.fn)  6l  1  'A  fm . ^.'i I }  =  COV {  S  i  1 1  (fm.fri).  S5 1 2dirdn)  } 

=  2o2S,(fm)S1(t„)S12(f 

m+n)  +  °(n)  ' 

E{e112(Wn)  GlllOm.fn)}  =  COVES')  1 2(^m»^n) »  Sm(f  m.(n)} 

=  2o2Si(fm)Si(fn)Si2(Wn)  +  O  Or)' 

where  Si  2(f)  is  the  cross-power  spectrum. 


The  variance  of  &d  is  then 


VAR{od(fm,fn)  }  =  - +  - 

l1l2(Wn)  HnCWn) 


(33) 


where 


2  pe  f--  12(fm<fn)  Bi  i  i(fm^n)  _ _ 

\|B,,2(fm.f„)|  |B,„(fm,f„)|  °,,2<Wn) 


Gi  l2(Wn)  = 


Bn2(Wn)|  |Bln(Wn)| 
Si(m)S1(n)S12(m+n) 


This  expression  for  the  variance  of  the  phase  difference  holds  without  making 
the  time-delay  restriction  implied  by  Eq.  22. 

VII.  EFFECTS  OF  NOISE 

Now  suppose  that  we  observe  the  two  processes  with  additive  Gaussian 
noise  processes  that  are  independent  of  the  signals.  Then  the  cross¬ 
bispectrum  of  the  signals  plus  noise  is  the  SAME  as  that  of  the  signals  only. 
This  result  holds  if  certain  cross-bispectra  of  the  signals  and  noise  are  zero,  but 
we  will  stick  with  the  stronger  independence  assumption  to  simplify  exposition. 

If  we  denote  the  ratio  of  the  signal  and  noise  power  spectra  of  the  ith 
channel  by  pj(f),  then  the  variance  of  the  phase  difference  for  the  signal  plus 
noise  case  is 


VAR{<^N(U,fn)}  =  -—2^  “  [  1  +Pl'(W]  [  1  +Pi’(fn)]  [  1  +P2  (Wm)] 


1  nj(fm.fn) 


+  „  0 - i  1  -*-p11(fm)j  [  1  +Pi1(fn)j !  1  -*-pi1(Wm)] 

I  lll(fm.fn) 


Re  f  [ ,  .p,'(l„)][l  ♦p,' '(.„)]  f  Vp,’(U.„W0„.. 


B  M frTi.fn  j|  j  B  n  i  )  j 


Gi 


W^ijOrrun) 
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Wi2(S)  and  Wi2^)  are  the  signal  and  noise  (complex)  coherency  spectrum 
defined  as 


W12(f) 


_ Si2(0___ 

^Sn(f)S22(f) 


Equation  35  is  the  general  expression  for  the  variance  of  the  phase  difference 
for  signal  and  noise  of  arbitrary  coherence. 


If  we  now  invoke  the  assumption  of  Eq.  22  that  the  signals  are  time 
delayed  versions  of  each  other  and  the  additional  assumption  that  the  noises 
have  approximately  the  same  power  spectral  densities,  then  the  variance  of  the 
phase  difference  can  be  simplified  to 


VAR{Od+N(fm,fn)} 


_2o!  __  [  +p-i(fm)j  j  t  +p-1(fn)!  i -ReM^f^-^-*]}  . 

1  11l(Wn) 


(36) 


This  is  the  expression  for  the  variance  of  the  phase  difference  for  correlated 
noise.  If  the  noise  processes  are  time  delayed  versions  of  one  another  with 
delay  time  xn.  then  the  term  in  the  braces  { }  is  just  1-cos(2jifm+n('Crrx))>  the 

variance  is  dependent  upon  the  difference  in  time  delays  between  the  signal 
and  noise.  On  the  other  hand,  if  the  noise  processes  are  independent  of  one 
another,  then  the  term  in  the  braces  {  }  is  1  and  the  variance  of  the  phase 
differences  reduces  in  this  case  to 


VAR{o>t*N(fm,f„)}=— 2s?—  [upAwKupAwip'AWn)  •  (37) 

fill  (fm.fn) 


Note  that  the  asymptotic  variance  of  the  phase  difference  for  the 
bifrequency  pair  (fm,  fn)  decreases  with  the  square  of  the  skewness  and  with 
(approximately)  the  cube  of  the  signal-to-noise  ratio.  The  term  o3=(NAn^)’1 
determines  the  rate  of  convergence  of  the  estimator  to  the  true  phase  difference 
as  N  -» oo. 
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VIII.  COMPARISON  TO  THE  CROSS-POWER  SPECTRUM  PHASE 


The  more  common  approach  to  estimating  a  time  delay  is  by  computing 
the  cross-power  spectrum  and  its  inverse  Fourier  transform,  the  cross¬ 
correlation.  The  parallels  to  Eqs.  23  and  25  for  the  cross-power  spectrum  are 


and 


Si2(0  =  Si(f)e'2nft 


(38) 


j  =  <1*12(0 

2nf 


(39) 


where  Oi2(f)  is  the  estimate  of  the  phase  cf  the  cross-power  spectrum.  If  the 
cross-power  spectrum  is  estimated  in  a  manner  similar  to  the  bispectrum 
estimate  presented  in  Section  II,  and  if  the  cross-power  spectrum  phase  is 
estimated  as  the  arctangent  of  the  ratio  of  the  imaginary  to  the  real  parts  of  the 
cross-power  spectrum,  then  from  the  Taylor  series  expansion  of  the  arctangent 
(similar  to  Eq.  17)  it  can  be  shown  that  the  variance  of  the  cross-power  spectrum 
phase  is 


VAR  012(f)  = 


Op12 

|W12(f)|2 


(40) 


where  =  (2NAn)'1- 

For  the  case  of  signal-plus-noise,  the  variance  is  given  by 
VAR  0^N(f)  = 

(41) 


Iw?/ 

+  + 

w?2(wV2>  +  (w?2>  w?2 

.0  +pi1(o)(i  +p21(o) 

(1+Pl(0)(1+P2(0)  /[T 

+Pl'(f))(l  +p2'(t))(1  +Pl(f)X1  +P2(0) 

This  is  analogous  to  Eq.  35  for  the  cross-bispectrum. 


If  the  signal  is  assumed  to  be  perfectly  coherent  and  the  noises  have 
approximately  the  same  power  spectral  densities,  then  in  analogy  to  Eq.  36,  the 
variance  of  the  cross-power  spectrum  phase  simplifies  to 


VAR  <&N(f)  =  Op, 


P’2 


(i+p;'0))2 


wVjg  (vVf2)'  +  wV2  1  ■ 1 

tupi(f»2  +  (i+p;1(f))(i+p1()))! 


(42) 


Finally,  if  the  noises  are  incoherent,  then  the  variance  further  reduces  to  0 

VARO?2N(f)  =  o2.?(Upi1(f))2  .  (43) 

This  is  to  be  compared  with  Eq.  37  for  the  cross-bispectrum  phase  difference.  • 


The  question  that  now  arises  is  in  the  case  of  uncorrelated  noise,  are 
there  conditions  under  which  the  cross-bispectrum  time  delay  estimation 
method  will  provide  a  more  accurate  estimation  of  the  time  delay  than  the  cross-  • 

power  spectrum  method.  This  essentially  involves  a  comparison  of  Eqs.  37  and 
43.  Note  that  the  variance  of  the  cross-bispectrum  phase  difference  is 
approximately  (for  low  signal-to-noise  ratios)  inversely  proportional  to  the  cube 
of  the  signal-to-noise  ratio,  whereas  for  the  cross-power  spectrum  phase  the  • 

variance  is  inversely  proportional  to  the  square  of  the  signal-to-noise  ratio.  This 
implies  that  for  low  signal-to-noise  ratios  the  variance  will  increase  more  rapidly 
for  the  cross-bispectrum  phase  difference  than  for  the  cross-power  spectrum 
phase.  However,  the  variance  of  the  cross-bispectrum  phase  difference  is  also  • 

inversely  proportional  to  the  square  of  the  skewness  function  of  the  signal, 
implying  that  larger  values  of  the  skewness  function  can  partially  offset  the  more 
pronounced  effect  that  lower  signal-to-noise  ratios  have  on  the  cross- 
bispectrum  phase  difference.  • 

Sample  size  also  contributes  differently  to  the  two  variances.  For  the 
cross-bispectrum  phase  difference,  c2=(2Na2n)-1  ,  whereas  for  the  cross¬ 
power  spectrum  phase,  Op. 2  =(2NAn)'1  ■  If  An  is  chosen  to  be  on  the  order  of  • 

N'1^  in  both  cases,  then  o2~0(1)  while  Op-?~-0(N‘"*/2)-  Thus  the  variance  will 
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reduce  more  rapidly  as  sample  size  increases  for  the  cross-power  spectrum 
phase  than  for  the  cross-bispectrum  phased  difference. 

Shown  in  Fig.  5  are  plots  of  the  variances  of  the  cross-bispectrum  phase 
difference  and  cross-power  spectrum  phase  as  a  function  of  signal-to-noise 
ratio.  For  Eq.  37,  the  signal-to-noise  ratios  are  taken  to  be  the  same  at  the  three 
frequencies,  and  r2  was  set  to  1 .  For  both  Eqs.  37  and  43,  o2  was  set  to  1 .  For 
this  case,  the  comparison  shown  in  Fig.  5a  demonstrates  that  for  low  signal-to- 
noise  ratios  the  cross-bispectrum  phase  difference  can  have  a  much  larger 
variance  than  the  cross-power  spectrum  phase  due  to  its  cubic  dependency  on 
signal-to-noise  ratio.  However,  at  high  signal-to-noise  ratios  its  variance  is 
slightly  smaller  than  the  variance  of  the  cross-power  spectrum  phase. 

In  interpreting  these  results,  it  should  be  kept  in  mind  that  the  variance  of 
the  cross-bispectrum  phase  difference  is  inversely  proportional  to  r2,  so  that 
larger  values  of  F2  than  presented  in  Fig.  5  will  reduce  its  variance.  On  the 
other  hand,  o2  and  o^12  were  both  set  to  1  for  the  two  cases,  even  though  o£12 
can  be  made  to  reduce  more  rapidly  than  o2  and  still  maintain  consistency  as 
sampling  size  increases.  Thus  an  appropriate  choice  of  averaging  can  result  in 
a  smaller  cross-power  spectrum  phase  variance  than  presented  in  Fig.  5. 
Clearly,  for  low  signal-to-noise  ratios  and  uncorrelated  noise,  the  cross¬ 
bispectrum  will  not  usually  provide  any  performance  advantage.  The  obvious 
advantage  of  the  cross-bispectrum  is  when  correlated  Gaussian  noise  is 
present,  which  results  in  a  bias  for  the  cross-power  spectrum. 

VII.  SUMMARY 

In  this  paper  we  have  derived  the  statistical  properties  of  a  consistent 
time  delay  estimator  for  non-Gaussian  signals  in  correlated  noise.  We  have 
shown  the  dependence  of  the  performance  of  the  time  delay  estimator  on  the 
skewness  of  the  signal,  the  correlation  of  the  noise,  the  signal-to-noise  ratio, 
and  the  sample  size.  The  asymptotic  variance  of  the  time  delay  estimator  is 
inversely  proportional  to  the  square  of  the  skewness  and  approximately 
inversely  proportional  to  the  cube  of  the  signal-to-noise  ratio. 
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